Supervised Learning - Foundations Project: ReCell¶

Problem Statement¶

Business Context¶

Buying and selling used phones and tablets used to be something that happened on a handful of online marketplace sites. But the used and refurbished device market has grown considerably over the past decade, and a new IDC (International Data Corporation) forecast predicts that the used phone market would be worth \$52.7bn by 2023 with a compound annual growth rate (CAGR) of 13.6% from 2018 to 2023. This growth can be attributed to an uptick in demand for used phones and tablets that offer considerable savings compared with new models.

Refurbished and used devices continue to provide cost-effective alternatives to both consumers and businesses that are looking to save money when purchasing one. There are plenty of other benefits associated with the used device market. Used and refurbished devices can be sold with warranties and can also be insured with proof of purchase. Third-party vendors/platforms, such as Verizon, Amazon, etc., provide attractive offers to customers for refurbished devices. Maximizing the longevity of devices through second-hand trade also reduces their environmental impact and helps in recycling and reducing waste. The impact of the COVID-19 outbreak may further boost this segment as consumers cut back on discretionary spending and buy phones and tablets only for immediate needs.

Objective¶

The rising potential of this comparatively under-the-radar market fuels the need for an ML-based solution to develop a dynamic pricing strategy for used and refurbished devices. ReCell, a startup aiming to tap the potential in this market, has hired you as a data scientist. They want you to analyze the data provided and build a linear regression model to predict the price of a used phone/tablet and identify factors that significantly influence it.

Data Description¶

The data contains the different attributes of used/refurbished phones and tablets. The data was collected in the year 2021. The detailed data dictionary is given below.

  • brand_name: Name of manufacturing brand
  • os: OS on which the device runs
  • screen_size: Size of the screen in cm
  • 4g: Whether 4G is available or not
  • 5g: Whether 5G is available or not
  • main_camera_mp: Resolution of the rear camera in megapixels
  • selfie_camera_mp: Resolution of the front camera in megapixels
  • int_memory: Amount of internal memory (ROM) in GB
  • ram: Amount of RAM in GB
  • battery: Energy capacity of the device battery in mAh
  • weight: Weight of the device in grams
  • release_year: Year when the device model was released
  • days_used: Number of days the used/refurbished device has been used
  • normalized_new_price: Normalized price of a new device of the same model in euros
  • normalized_used_price: Normalized price of the used/refurbished device in euros

Importing necessary libraries¶

In [1]:
import warnings # ignore warnings
warnings.filterwarnings('ignore')
In [2]:
# Installing the libraries with the specified version.
# uncomment and run the following line if Google Colab is being used
# !pip install scikit-learn==1.2.2 seaborn==0.13.1 matplotlib==3.7.1 numpy==1.25.2 pandas==1.5.3 -q --user
In [3]:
# Installing the libraries with the specified version.
# uncomment and run the following lines if Jupyter Notebook is being used
!pip install scikit-learn==1.2.2 seaborn==0.11.1 matplotlib==3.3.4 numpy==1.24.3 pandas==1.5.2 -q --user
  error: subprocess-exited-with-error
  
  python setup.py bdist_wheel did not run successfully.
  exit code: 1
  
  [592 lines of output]
  
  Edit setup.cfg to change the build options; suppress output with --quiet.
  
  BUILDING MATPLOTLIB
    matplotlib: yes [3.3.4]
        python: yes [3.11.7 | packaged by Anaconda, Inc. | (main, Dec 15 2023,
                    18:05:47) [MSC v.1916 64 bit (AMD64)]]
      platform: yes [win32]
   sample_data: yes [installing]
         tests: no  [skipping due to configuration]
        macosx: no  [Mac OS-X only]
  
  C:\Users\otroc\anaconda3\Lib\site-packages\setuptools\__init__.py:80: _DeprecatedInstaller: setuptools.installer and fetch_build_eggs are deprecated.
  !!
  
          ********************************************************************************
          Requirements should be satisfied by a PEP 517 installer.
          If you are using pip, you can try `pip install --use-pep517`.
          ********************************************************************************
  
  !!
    dist.fetch_build_eggs(dist.setup_requires)
  C:\Users\otroc\anaconda3\Lib\site-packages\setuptools\dist.py:700: SetuptoolsDeprecationWarning: The namespace_packages parameter is deprecated.
  !!
  
          ********************************************************************************
          Please replace its usage with implicit namespaces (PEP 420).
  
          See https://setuptools.pypa.io/en/latest/references/keywords.html#keyword-namespace-packages for details.
          ********************************************************************************
  
  !!
    ep.load()(self, ep.name, value)
  running bdist_wheel
  running build
  running build_py
  creating build
  creating build\lib.win-amd64-cpython-311
  copying lib\pylab.py -> build\lib.win-amd64-cpython-311
  creating build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\afm.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\animation.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\artist.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\axis.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\backend_bases.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\backend_managers.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\backend_tools.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\bezier.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\blocking_input.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\category.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\cm.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\collections.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\colorbar.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\colors.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\container.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\contour.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\dates.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\docstring.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\dviread.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\figure.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\fontconfig_pattern.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\font_manager.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\gridspec.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\hatch.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\image.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\legend.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\legend_handler.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\lines.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\markers.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\mathtext.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\mlab.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\offsetbox.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\patches.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\path.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\patheffects.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\pylab.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\pyplot.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\quiver.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\rcsetup.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\sankey.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\scale.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\spines.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\stackplot.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\streamplot.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\table.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\texmanager.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\text.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\textpath.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\ticker.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\tight_bbox.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\tight_layout.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\transforms.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\ttconv.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\type1font.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\units.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\widgets.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\_animation_data.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\_cm.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\_cm_listed.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\_color_data.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\_constrained_layout.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\_internal_utils.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\_layoutbox.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\_mathtext_data.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\_pylab_helpers.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\_text_layout.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\_version.py -> build\lib.win-amd64-cpython-311\matplotlib
  copying lib\matplotlib\__init__.py -> build\lib.win-amd64-cpython-311\matplotlib
  creating build\lib.win-amd64-cpython-311\mpl_toolkits
  copying lib\mpl_toolkits\__init__.py -> build\lib.win-amd64-cpython-311\mpl_toolkits
  creating build\lib.win-amd64-cpython-311\matplotlib\axes
  copying lib\matplotlib\axes\_axes.py -> build\lib.win-amd64-cpython-311\matplotlib\axes
  copying lib\matplotlib\axes\_base.py -> build\lib.win-amd64-cpython-311\matplotlib\axes
  copying lib\matplotlib\axes\_secondary_axes.py -> build\lib.win-amd64-cpython-311\matplotlib\axes
  copying lib\matplotlib\axes\_subplots.py -> build\lib.win-amd64-cpython-311\matplotlib\axes
  copying lib\matplotlib\axes\__init__.py -> build\lib.win-amd64-cpython-311\matplotlib\axes
  creating build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_agg.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_cairo.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_gtk3.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_gtk3agg.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_gtk3cairo.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_macosx.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_mixed.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_nbagg.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_pdf.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_pgf.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_ps.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_qt4.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_qt4agg.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_qt4cairo.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_qt5.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_qt5agg.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_qt5cairo.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_svg.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_template.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_tkagg.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_tkcairo.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_webagg.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_webagg_core.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_wx.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_wxagg.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\backend_wxcairo.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\qt_compat.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\_backend_pdf_ps.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\_backend_tk.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  copying lib\matplotlib\backends\__init__.py -> build\lib.win-amd64-cpython-311\matplotlib\backends
  creating build\lib.win-amd64-cpython-311\matplotlib\cbook
  copying lib\matplotlib\cbook\deprecation.py -> build\lib.win-amd64-cpython-311\matplotlib\cbook
  copying lib\matplotlib\cbook\__init__.py -> build\lib.win-amd64-cpython-311\matplotlib\cbook
  creating build\lib.win-amd64-cpython-311\matplotlib\compat
  copying lib\matplotlib\compat\__init__.py -> build\lib.win-amd64-cpython-311\matplotlib\compat
  creating build\lib.win-amd64-cpython-311\matplotlib\projections
  copying lib\matplotlib\projections\geo.py -> build\lib.win-amd64-cpython-311\matplotlib\projections
  copying lib\matplotlib\projections\polar.py -> build\lib.win-amd64-cpython-311\matplotlib\projections
  copying lib\matplotlib\projections\__init__.py -> build\lib.win-amd64-cpython-311\matplotlib\projections
  creating build\lib.win-amd64-cpython-311\matplotlib\sphinxext
  copying lib\matplotlib\sphinxext\mathmpl.py -> build\lib.win-amd64-cpython-311\matplotlib\sphinxext
  copying lib\matplotlib\sphinxext\plot_directive.py -> build\lib.win-amd64-cpython-311\matplotlib\sphinxext
  copying lib\matplotlib\sphinxext\__init__.py -> build\lib.win-amd64-cpython-311\matplotlib\sphinxext
  creating build\lib.win-amd64-cpython-311\matplotlib\style
  copying lib\matplotlib\style\core.py -> build\lib.win-amd64-cpython-311\matplotlib\style
  copying lib\matplotlib\style\__init__.py -> build\lib.win-amd64-cpython-311\matplotlib\style
  creating build\lib.win-amd64-cpython-311\matplotlib\testing
  copying lib\matplotlib\testing\compare.py -> build\lib.win-amd64-cpython-311\matplotlib\testing
  copying lib\matplotlib\testing\conftest.py -> build\lib.win-amd64-cpython-311\matplotlib\testing
  copying lib\matplotlib\testing\decorators.py -> build\lib.win-amd64-cpython-311\matplotlib\testing
  copying lib\matplotlib\testing\disable_internet.py -> build\lib.win-amd64-cpython-311\matplotlib\testing
  copying lib\matplotlib\testing\exceptions.py -> build\lib.win-amd64-cpython-311\matplotlib\testing
  copying lib\matplotlib\testing\widgets.py -> build\lib.win-amd64-cpython-311\matplotlib\testing
  copying lib\matplotlib\testing\__init__.py -> build\lib.win-amd64-cpython-311\matplotlib\testing
  creating build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\conftest.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_afm.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_agg.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_agg_filter.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_animation.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_arrow_patches.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_artist.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_axes.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_backends_interactive.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_backend_bases.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_backend_cairo.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_backend_nbagg.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_backend_pdf.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_backend_pgf.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_backend_ps.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_backend_qt.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_backend_svg.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_backend_tk.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_backend_tools.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_backend_webagg.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_basic.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_bbox_tight.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_category.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_cbook.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_collections.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_colorbar.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_colors.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_compare_images.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_constrainedlayout.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_container.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_contour.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_cycles.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_dates.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_determinism.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_dviread.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_figure.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_fontconfig_pattern.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_font_manager.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_gridspec.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_image.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_legend.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_lines.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_marker.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_mathtext.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_matplotlib.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_mlab.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_offsetbox.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_patches.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_path.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_patheffects.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_pickle.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_png.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_polar.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_preprocess_data.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_pyplot.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_quiver.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_rcparams.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_sankey.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_scale.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_simplification.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_skew.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_sphinxext.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_spines.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_streamplot.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_style.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_subplots.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_table.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_testing.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_texmanager.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_text.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_ticker.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_tightlayout.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_transforms.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_triangulation.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_ttconv.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_type1font.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_units.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_usetex.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\test_widgets.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  copying lib\matplotlib\tests\__init__.py -> build\lib.win-amd64-cpython-311\matplotlib\tests
  creating build\lib.win-amd64-cpython-311\matplotlib\tri
  copying lib\matplotlib\tri\triangulation.py -> build\lib.win-amd64-cpython-311\matplotlib\tri
  copying lib\matplotlib\tri\tricontour.py -> build\lib.win-amd64-cpython-311\matplotlib\tri
  copying lib\matplotlib\tri\trifinder.py -> build\lib.win-amd64-cpython-311\matplotlib\tri
  copying lib\matplotlib\tri\triinterpolate.py -> build\lib.win-amd64-cpython-311\matplotlib\tri
  copying lib\matplotlib\tri\tripcolor.py -> build\lib.win-amd64-cpython-311\matplotlib\tri
  copying lib\matplotlib\tri\triplot.py -> build\lib.win-amd64-cpython-311\matplotlib\tri
  copying lib\matplotlib\tri\trirefine.py -> build\lib.win-amd64-cpython-311\matplotlib\tri
  copying lib\matplotlib\tri\tritools.py -> build\lib.win-amd64-cpython-311\matplotlib\tri
  copying lib\matplotlib\tri\__init__.py -> build\lib.win-amd64-cpython-311\matplotlib\tri
  creating build\lib.win-amd64-cpython-311\matplotlib\backends\qt_editor
  copying lib\matplotlib\backends\qt_editor\figureoptions.py -> build\lib.win-amd64-cpython-311\matplotlib\backends\qt_editor
  copying lib\matplotlib\backends\qt_editor\formsubplottool.py -> build\lib.win-amd64-cpython-311\matplotlib\backends\qt_editor
  copying lib\matplotlib\backends\qt_editor\_formlayout.py -> build\lib.win-amd64-cpython-311\matplotlib\backends\qt_editor
  copying lib\matplotlib\backends\qt_editor\_formsubplottool.py -> build\lib.win-amd64-cpython-311\matplotlib\backends\qt_editor
  copying lib\matplotlib\backends\qt_editor\__init__.py -> build\lib.win-amd64-cpython-311\matplotlib\backends\qt_editor
  creating build\lib.win-amd64-cpython-311\matplotlib\testing\jpl_units
  copying lib\matplotlib\testing\jpl_units\Duration.py -> build\lib.win-amd64-cpython-311\matplotlib\testing\jpl_units
  copying lib\matplotlib\testing\jpl_units\Epoch.py -> build\lib.win-amd64-cpython-311\matplotlib\testing\jpl_units
  copying lib\matplotlib\testing\jpl_units\EpochConverter.py -> build\lib.win-amd64-cpython-311\matplotlib\testing\jpl_units
  copying lib\matplotlib\testing\jpl_units\StrConverter.py -> build\lib.win-amd64-cpython-311\matplotlib\testing\jpl_units
  copying lib\matplotlib\testing\jpl_units\UnitDbl.py -> build\lib.win-amd64-cpython-311\matplotlib\testing\jpl_units
  copying lib\matplotlib\testing\jpl_units\UnitDblConverter.py -> build\lib.win-amd64-cpython-311\matplotlib\testing\jpl_units
  copying lib\matplotlib\testing\jpl_units\UnitDblFormatter.py -> build\lib.win-amd64-cpython-311\matplotlib\testing\jpl_units
  copying lib\matplotlib\testing\jpl_units\__init__.py -> build\lib.win-amd64-cpython-311\matplotlib\testing\jpl_units
  creating build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid
  copying lib\mpl_toolkits\axes_grid\anchored_artists.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid
  copying lib\mpl_toolkits\axes_grid\angle_helper.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid
  copying lib\mpl_toolkits\axes_grid\axes_divider.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid
  copying lib\mpl_toolkits\axes_grid\axes_grid.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid
  copying lib\mpl_toolkits\axes_grid\axes_rgb.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid
  copying lib\mpl_toolkits\axes_grid\axes_size.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid
  copying lib\mpl_toolkits\axes_grid\axislines.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid
  copying lib\mpl_toolkits\axes_grid\axisline_style.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid
  copying lib\mpl_toolkits\axes_grid\axis_artist.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid
  copying lib\mpl_toolkits\axes_grid\clip_path.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid
  copying lib\mpl_toolkits\axes_grid\colorbar.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid
  copying lib\mpl_toolkits\axes_grid\floating_axes.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid
  copying lib\mpl_toolkits\axes_grid\grid_finder.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid
  copying lib\mpl_toolkits\axes_grid\grid_helper_curvelinear.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid
  copying lib\mpl_toolkits\axes_grid\inset_locator.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid
  copying lib\mpl_toolkits\axes_grid\parasite_axes.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid
  copying lib\mpl_toolkits\axes_grid\__init__.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid
  creating build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid1
  copying lib\mpl_toolkits\axes_grid1\anchored_artists.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid1
  copying lib\mpl_toolkits\axes_grid1\axes_divider.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid1
  copying lib\mpl_toolkits\axes_grid1\axes_grid.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid1
  copying lib\mpl_toolkits\axes_grid1\axes_rgb.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid1
  copying lib\mpl_toolkits\axes_grid1\axes_size.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid1
  copying lib\mpl_toolkits\axes_grid1\colorbar.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid1
  copying lib\mpl_toolkits\axes_grid1\inset_locator.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid1
  copying lib\mpl_toolkits\axes_grid1\mpl_axes.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid1
  copying lib\mpl_toolkits\axes_grid1\parasite_axes.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid1
  copying lib\mpl_toolkits\axes_grid1\__init__.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axes_grid1
  creating build\lib.win-amd64-cpython-311\mpl_toolkits\axisartist
  copying lib\mpl_toolkits\axisartist\angle_helper.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axisartist
  copying lib\mpl_toolkits\axisartist\axes_divider.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axisartist
  copying lib\mpl_toolkits\axisartist\axes_grid.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axisartist
  copying lib\mpl_toolkits\axisartist\axes_rgb.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axisartist
  copying lib\mpl_toolkits\axisartist\axislines.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axisartist
  copying lib\mpl_toolkits\axisartist\axisline_style.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axisartist
  copying lib\mpl_toolkits\axisartist\axis_artist.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axisartist
  copying lib\mpl_toolkits\axisartist\clip_path.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axisartist
  copying lib\mpl_toolkits\axisartist\floating_axes.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axisartist
  copying lib\mpl_toolkits\axisartist\grid_finder.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axisartist
  copying lib\mpl_toolkits\axisartist\grid_helper_curvelinear.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axisartist
  copying lib\mpl_toolkits\axisartist\parasite_axes.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axisartist
  copying lib\mpl_toolkits\axisartist\__init__.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\axisartist
  creating build\lib.win-amd64-cpython-311\mpl_toolkits\mplot3d
  copying lib\mpl_toolkits\mplot3d\art3d.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\mplot3d
  copying lib\mpl_toolkits\mplot3d\axes3d.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\mplot3d
  copying lib\mpl_toolkits\mplot3d\axis3d.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\mplot3d
  copying lib\mpl_toolkits\mplot3d\proj3d.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\mplot3d
  copying lib\mpl_toolkits\mplot3d\__init__.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\mplot3d
  creating build\lib.win-amd64-cpython-311\mpl_toolkits\tests
  copying lib\mpl_toolkits\tests\conftest.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\tests
  copying lib\mpl_toolkits\tests\test_axes_grid.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\tests
  copying lib\mpl_toolkits\tests\test_axes_grid1.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\tests
  copying lib\mpl_toolkits\tests\test_axisartist_angle_helper.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\tests
  copying lib\mpl_toolkits\tests\test_axisartist_axislines.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\tests
  copying lib\mpl_toolkits\tests\test_axisartist_axis_artist.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\tests
  copying lib\mpl_toolkits\tests\test_axisartist_clip_path.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\tests
  copying lib\mpl_toolkits\tests\test_axisartist_floating_axes.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\tests
  copying lib\mpl_toolkits\tests\test_axisartist_grid_finder.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\tests
  copying lib\mpl_toolkits\tests\test_axisartist_grid_helper_curvelinear.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\tests
  copying lib\mpl_toolkits\tests\test_mplot3d.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\tests
  copying lib\mpl_toolkits\tests\__init__.py -> build\lib.win-amd64-cpython-311\mpl_toolkits\tests
  creating build\lib.win-amd64-cpython-311\matplotlib\mpl-data
  creating build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts
  creating build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\afm\pplbi8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\afm\putbi8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\matplotlibrc -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data
  copying lib\matplotlib\mpl-data\fonts\afm\putri8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  creating build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  copying lib\matplotlib\mpl-data\sample_data\logo2.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  creating build\lib.win-amd64-cpython-311\matplotlib\backends\web_backend
  copying lib\matplotlib\backends\web_backend\.eslintrc.js -> build\lib.win-amd64-cpython-311\matplotlib\backends\web_backend
  creating build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\images\forward.pdf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\backends\web_backend\package.json -> build\lib.win-amd64-cpython-311\matplotlib\backends\web_backend
  creating build\lib.win-amd64-cpython-311\matplotlib\backends\web_backend\css
  copying lib\matplotlib\backends\web_backend\css\page.css -> build\lib.win-amd64-cpython-311\matplotlib\backends\web_backend\css
  copying lib\matplotlib\mpl-data\fonts\afm\pcrro8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\afm\pplb8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  creating build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\fonts\ttf\STIXGeneralBolIta.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\images\forward.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  creating build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\stylelib\bmh.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\images\zoom_to_rect-symbolic.svg -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\stylelib\seaborn-dark.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\images\qt4_editor_options.svg -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\sample_data\membrane.dat -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  copying lib\matplotlib\mpl-data\fonts\ttf\DejaVuSerif-Italic.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\images\back.svg -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\images\subplots.gif -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\sample_data\README.txt -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  copying lib\matplotlib\mpl-data\images\zoom_to_rect.svg -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\sample_data\jacksboro_fault_dem.npz -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  copying lib\matplotlib\mpl-data\fonts\ttf\DejaVuSans-Oblique.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\stylelib\Solarize_Light2.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\fonts\afm\phvbo8an.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\images\help.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  creating build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\pdfcorefonts
  copying lib\matplotlib\mpl-data\fonts\pdfcorefonts\Helvetica-Bold.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\pdfcorefonts
  copying lib\matplotlib\mpl-data\images\home.gif -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\backends\web_backend\css\fbm.css -> build\lib.win-amd64-cpython-311\matplotlib\backends\web_backend\css
  copying lib\matplotlib\mpl-data\images\hand.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\images\help-symbolic.svg -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\afm\ptmri8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\ttf\cmss10.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\fonts\ttf\LICENSE_STIX -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\backends\web_backend\ipython_inline_figure.html -> build\lib.win-amd64-cpython-311\matplotlib\backends\web_backend
  copying lib\matplotlib\mpl-data\images\help.pdf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\images\home-symbolic.svg -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\afm\pncbi8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\afm\phvbo8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\afm\phvr8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\ttf\DejaVuSerifDisplay.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\sample_data\ct.raw.gz -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  copying lib\matplotlib\mpl-data\images\back.pdf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\images\help.ppm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\ttf\STIXSizTwoSymBol.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\images\forward.gif -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\stylelib\seaborn-bright.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\images\home.svg -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\sample_data\eeg.dat -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  copying lib\matplotlib\mpl-data\fonts\ttf\cmb10.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\stylelib\seaborn-dark-palette.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\images\filesave-symbolic.svg -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\ttf\DejaVuSansMono-BoldOblique.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\images\forward_large.gif -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\images\qt4_editor_options.pdf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\ttf\DejaVuSansMono-Bold.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\images\matplotlib.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\afm\cmex10.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\sample_data\msft.csv -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  copying lib\matplotlib\mpl-data\images\back_large.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\sample_data\percent_bachelors_degrees_women_usa.csv -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  copying lib\matplotlib\mpl-data\sample_data\s1045.ima.gz -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  copying lib\matplotlib\backends\web_backend\css\mpl.css -> build\lib.win-amd64-cpython-311\matplotlib\backends\web_backend\css
  copying lib\matplotlib\mpl-data\stylelib\seaborn-whitegrid.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\fonts\afm\pcrr8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\images\back_large.gif -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\stylelib\grayscale.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\fonts\afm\pzdr.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\ttf\STIXSizFourSymReg.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\fonts\afm\pbkd8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\images\hand.pdf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\afm\phvb8an.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\images\subplots-symbolic.svg -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\afm\pagk8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\pdfcorefonts\Helvetica-BoldOblique.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\pdfcorefonts
  copying lib\matplotlib\mpl-data\fonts\afm\phvro8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\images\help.svg -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\afm\phvb8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\ttf\STIXNonUniIta.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\sample_data\Minduka_Present_Blue_Pack.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  copying lib\matplotlib\mpl-data\fonts\afm\pzcmi8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\ttf\STIXSizTwoSymReg.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\images\back.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\sample_data\goog.npz -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  copying lib\matplotlib\mpl-data\images\move_large.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  creating build\lib.win-amd64-cpython-311\matplotlib\backends\web_backend\js
  copying lib\matplotlib\backends\web_backend\js\nbagg_mpl.js -> build\lib.win-amd64-cpython-311\matplotlib\backends\web_backend\js
  copying lib\matplotlib\mpl-data\images\subplots_large.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\pdfcorefonts\ZapfDingbats.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\pdfcorefonts
  copying lib\matplotlib\mpl-data\fonts\ttf\STIXNonUni.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\images\subplots_large.gif -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\ttf\STIXGeneralBol.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\fonts\ttf\STIXSizThreeSymBol.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\stylelib\classic.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\fonts\ttf\DejaVuSansMono.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\fonts\afm\pncb8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\sample_data\topobathy.npz -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  copying lib\matplotlib\mpl-data\images\filesave_large.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\afm\ptmb8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\stylelib\seaborn-paper.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\fonts\afm\phvlo8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\stylelib\seaborn-ticks.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\fonts\afm\pplri8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\ttf\cmsy10.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\sample_data\None_vs_nearest-pdf.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  copying lib\matplotlib\mpl-data\images\qt4_editor_options.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\afm\pplr8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\stylelib\fivethirtyeight.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\fonts\afm\psyr.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\stylelib\seaborn-white.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\fonts\ttf\STIXNonUniBolIta.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\fonts\ttf\DejaVuSans-BoldOblique.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\fonts\afm\cmtt10.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\pdfcorefonts\Courier.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\pdfcorefonts
  copying lib\matplotlib\mpl-data\stylelib\seaborn-talk.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\sample_data\aapl.npz -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  copying lib\matplotlib\mpl-data\images\matplotlib.svg -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\stylelib\seaborn-notebook.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\fonts\afm\phvr8an.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\backends\web_backend\js\mpl_tornado.js -> build\lib.win-amd64-cpython-311\matplotlib\backends\web_backend\js
  copying lib\matplotlib\mpl-data\fonts\pdfcorefonts\Courier-Oblique.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\pdfcorefonts
  copying lib\matplotlib\mpl-data\fonts\afm\pagd8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\backends\web_backend\.prettierrc -> build\lib.win-amd64-cpython-311\matplotlib\backends\web_backend
  copying lib\matplotlib\mpl-data\stylelib\fast.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\images\zoom_to_rect.gif -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\afm\cmr10.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\images\zoom_to_rect_large.gif -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\backends\web_backend\css\boilerplate.css -> build\lib.win-amd64-cpython-311\matplotlib\backends\web_backend\css
  copying lib\matplotlib\mpl-data\fonts\ttf\DejaVuSerif-Bold.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\fonts\pdfcorefonts\readme.txt -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\pdfcorefonts
  copying lib\matplotlib\mpl-data\fonts\ttf\cmex10.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\fonts\ttf\LICENSE_DEJAVU -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\stylelib\dark_background.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\images\hand_large.gif -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\stylelib\seaborn-deep.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\images\move.svg -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\images\help.gif -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\stylelib\tableau-colorblind10.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\fonts\pdfcorefonts\Times-Roman.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\pdfcorefonts
  copying lib\matplotlib\mpl-data\stylelib\ggplot.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\stylelib\seaborn.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\fonts\afm\pbkli8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\images\forward_large.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\pdfcorefonts\Helvetica.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\pdfcorefonts
  copying lib\matplotlib\mpl-data\images\forward-symbolic.svg -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\sample_data\data_x_x2_x3.csv -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  copying lib\matplotlib\mpl-data\fonts\pdfcorefonts\Symbol.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\pdfcorefonts
  copying lib\matplotlib\mpl-data\fonts\ttf\STIXGeneral.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\sample_data\grace_hopper.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  copying lib\matplotlib\mpl-data\fonts\pdfcorefonts\Helvetica-Oblique.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\pdfcorefonts
  copying lib\matplotlib\mpl-data\fonts\ttf\DejaVuSans-Bold.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\images\back-symbolic.svg -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\afm\cmmi10.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\afm\pcrbo8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\ttf\STIXGeneralItalic.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\fonts\pdfcorefonts\Times-BoldItalic.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\pdfcorefonts
  copying lib\matplotlib\mpl-data\images\zoom_to_rect_large.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\ttf\STIXSizFiveSymReg.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\backends\web_backend\all_figures.html -> build\lib.win-amd64-cpython-311\matplotlib\backends\web_backend
  copying lib\matplotlib\mpl-data\images\move.pdf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\stylelib\seaborn-muted.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\fonts\ttf\DejaVuSerif-BoldItalic.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\images\filesave.svg -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\afm\putr8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\backends\web_backend\nbagg_uat.ipynb -> build\lib.win-amd64-cpython-311\matplotlib\backends\web_backend
  copying lib\matplotlib\mpl-data\sample_data\embedding_in_wx3.xrc -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  copying lib\matplotlib\mpl-data\fonts\afm\pbkl8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\ttf\STIXSizOneSymBol.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\images\move.gif -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\afm\putb8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\stylelib\seaborn-poster.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\backends\web_backend\.prettierignore -> build\lib.win-amd64-cpython-311\matplotlib\backends\web_backend
  copying lib\matplotlib\mpl-data\images\home_large.gif -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\ttf\cmmi10.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\images\filesave.gif -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\images\hand.gif -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\images\qt4_editor_options_large.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\ttf\STIXSizFourSymBol.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\fonts\afm\pbkdi8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\afm\pncr8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\afm\phvl8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\sample_data\grace_hopper.jpg -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  copying lib\matplotlib\mpl-data\fonts\afm\ptmbi8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\pdfcorefonts\Courier-Bold.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\pdfcorefonts
  copying lib\matplotlib\mpl-data\images\help_large.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\images\matplotlib_large.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\images\filesave.pdf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\afm\phvro8an.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\images\forward.svg -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\stylelib\seaborn-colorblind.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\images\matplotlib_128.ppm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\images\zoom_to_rect.pdf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\images\subplots.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\stylelib\seaborn-darkgrid.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\fonts\afm\pagdo8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\stylelib\seaborn-pastel.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\mpl-data\fonts\afm\cmsy10.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\backends\web_backend\js\mpl.js -> build\lib.win-amd64-cpython-311\matplotlib\backends\web_backend\js
  copying lib\matplotlib\mpl-data\images\move-symbolic.svg -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\images\help_large.ppm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\afm\pagko8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\afm\ptmr8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\images\matplotlib.pdf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\pdfcorefonts\Times-Bold.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\pdfcorefonts
  copying lib\matplotlib\mpl-data\images\home.pdf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\images\home_large.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\sample_data\demodata.csv -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  copying lib\matplotlib\mpl-data\images\subplots.pdf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\ttf\DejaVuSansDisplay.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\fonts\ttf\STIXNonUniBol.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\images\home.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\afm\pncri8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\fonts\pdfcorefonts\Courier-BoldOblique.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\pdfcorefonts
  copying lib\matplotlib\mpl-data\fonts\ttf\DejaVuSansMono-Oblique.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\fonts\ttf\STIXSizOneSymReg.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\images\move_large.gif -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\images\zoom_to_rect.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\afm\pcrb8a.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\afm
  copying lib\matplotlib\mpl-data\images\filesave_large.gif -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\sample_data\ada.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data
  copying lib\matplotlib\mpl-data\images\hand.svg -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\images\move.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\ttf\DejaVuSerif.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\images\subplots.svg -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\fonts\ttf\DejaVuSans.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  creating build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data\axes_grid
  copying lib\matplotlib\mpl-data\sample_data\axes_grid\bivariate_normal.npy -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\sample_data\axes_grid
  copying lib\matplotlib\mpl-data\fonts\ttf\cmtt10.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\fonts\ttf\cmr10.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\stylelib\_classic_test_patch.mplstyle -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\stylelib
  copying lib\matplotlib\backends\web_backend\single_figure.html -> build\lib.win-amd64-cpython-311\matplotlib\backends\web_backend
  copying lib\matplotlib\mpl-data\fonts\ttf\STIXSizThreeSymReg.ttf -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\ttf
  copying lib\matplotlib\mpl-data\fonts\pdfcorefonts\Times-Italic.afm -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\fonts\pdfcorefonts
  copying lib\matplotlib\mpl-data\images\back.gif -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  copying lib\matplotlib\mpl-data\images\filesave.png -> build\lib.win-amd64-cpython-311\matplotlib\mpl-data\images
  UPDATING build\lib.win-amd64-cpython-311\matplotlib\_version.py
  set build\lib.win-amd64-cpython-311\matplotlib\_version.py to '3.3.4'
  running build_ext
  Extracting freetype-2.6.1.tar.gz
  Building freetype in build\freetype-2.6.1
  error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/
  [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for matplotlib
ERROR: Could not build wheels for matplotlib, which is required to install pyproject.toml-based projects

Note: After running the above cell, kindly restart the notebook kernel and run all cells sequentially from the start again.

In [4]:
# Common Libraries
import numpy as np # linear algebra
import pandas as pd # data manipulation and analysis
import scipy.stats as stats # mathematical algorithms and convenience functions
import statsmodels.stats.multicomp as stats_sm # statistical models, statistical tests, and statistical data exploration
import statsmodels.stats.proportion as stats_sp # statistical models, statistical tests, and statistical data exploration
import matplotlib.pyplot as plt # data visualization
import seaborn as sns # data visualization
import pylab # for QQ plots
from scipy.stats import zscore

# Command to tell Python to actually display the graphs
%matplotlib inline 
sns.set_style('whitegrid') # set style for visualization
pd.set_option('display.float_format', lambda x: '%.4f' % x) # To supress numerical display in scientific notations


#For randomized data splitting
from sklearn.model_selection import train_test_split
#To build linear regression_model
import statsmodels.api as sm
#To check model performance
from sklearn.metrics import mean_absolute_error, mean_squared_error, r2_score
#To check multicollinearity
from statsmodels.stats.outliers_influence import variance_inflation_factor
#To check Heteroscedasticity
import statsmodels.stats.api as sms
from statsmodels.compat import lzip

Loading the dataset¶

In [5]:
path1='C:\\Users\\otroc\\OneDrive\\Documents\\Carlos\\Training\\DSBA\\Python\\Jupyter Notebooks\\Module3_Project\\used_device_data.csv'
df = pd.read_csv(path1)

Data Overview¶

  • Observations
  • Sanity checks
  • Missing value treatment
  • Feature engineering
In [6]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3454 entries, 0 to 3453
Data columns (total 15 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   brand_name             3454 non-null   object 
 1   os                     3454 non-null   object 
 2   screen_size            3454 non-null   float64
 3   4g                     3454 non-null   object 
 4   5g                     3454 non-null   object 
 5   main_camera_mp         3275 non-null   float64
 6   selfie_camera_mp       3452 non-null   float64
 7   int_memory             3450 non-null   float64
 8   ram                    3450 non-null   float64
 9   battery                3448 non-null   float64
 10  weight                 3447 non-null   float64
 11  release_year           3454 non-null   int64  
 12  days_used              3454 non-null   int64  
 13  normalized_used_price  3454 non-null   float64
 14  normalized_new_price   3454 non-null   float64
dtypes: float64(9), int64(2), object(4)
memory usage: 404.9+ KB
In [7]:
df.head()
Out[7]:
brand_name os screen_size 4g 5g main_camera_mp selfie_camera_mp int_memory ram battery weight release_year days_used normalized_used_price normalized_new_price
0 Honor Android 14.5000 yes no 13.0000 5.0000 64.0000 3.0000 3020.0000 146.0000 2020 127 4.3076 4.7151
1 Honor Android 17.3000 yes yes 13.0000 16.0000 128.0000 8.0000 4300.0000 213.0000 2020 325 5.1621 5.5190
2 Honor Android 16.6900 yes yes 13.0000 8.0000 128.0000 8.0000 4200.0000 213.0000 2020 162 5.1111 5.8846
3 Honor Android 25.5000 yes yes 13.0000 8.0000 64.0000 6.0000 7250.0000 480.0000 2020 345 5.1354 5.6310
4 Honor Android 15.3200 yes no 13.0000 8.0000 64.0000 3.0000 5000.0000 185.0000 2020 293 4.3900 4.9478
In [8]:
df.tail()
Out[8]:
brand_name os screen_size 4g 5g main_camera_mp selfie_camera_mp int_memory ram battery weight release_year days_used normalized_used_price normalized_new_price
3449 Asus Android 15.3400 yes no NaN 8.0000 64.0000 6.0000 5000.0000 190.0000 2019 232 4.4923 6.4839
3450 Asus Android 15.2400 yes no 13.0000 8.0000 128.0000 8.0000 4000.0000 200.0000 2018 541 5.0377 6.2515
3451 Alcatel Android 15.8000 yes no 13.0000 5.0000 32.0000 3.0000 4000.0000 165.0000 2020 201 4.3573 4.5288
3452 Alcatel Android 15.8000 yes no 13.0000 5.0000 32.0000 2.0000 4000.0000 160.0000 2020 149 4.3498 4.6242
3453 Alcatel Android 12.8300 yes no 13.0000 5.0000 16.0000 2.0000 4000.0000 168.0000 2020 176 4.1321 4.2800
In [9]:
print("There are", df.shape[0], 'rows and', df.shape[1], "columns.")
There are 3454 rows and 15 columns.
In [10]:
df.describe(include='all').T
Out[10]:
count unique top freq mean std min 25% 50% 75% max
brand_name 3454 34 Others 502 NaN NaN NaN NaN NaN NaN NaN
os 3454 4 Android 3214 NaN NaN NaN NaN NaN NaN NaN
screen_size 3454.0000 NaN NaN NaN 13.7131 3.8053 5.0800 12.7000 12.8300 15.3400 30.7100
4g 3454 2 yes 2335 NaN NaN NaN NaN NaN NaN NaN
5g 3454 2 no 3302 NaN NaN NaN NaN NaN NaN NaN
main_camera_mp 3275.0000 NaN NaN NaN 9.4602 4.8155 0.0800 5.0000 8.0000 13.0000 48.0000
selfie_camera_mp 3452.0000 NaN NaN NaN 6.5542 6.9704 0.0000 2.0000 5.0000 8.0000 32.0000
int_memory 3450.0000 NaN NaN NaN 54.5731 84.9724 0.0100 16.0000 32.0000 64.0000 1024.0000
ram 3450.0000 NaN NaN NaN 4.0361 1.3651 0.0200 4.0000 4.0000 4.0000 12.0000
battery 3448.0000 NaN NaN NaN 3133.4027 1299.6828 500.0000 2100.0000 3000.0000 4000.0000 9720.0000
weight 3447.0000 NaN NaN NaN 182.7519 88.4132 69.0000 142.0000 160.0000 185.0000 855.0000
release_year 3454.0000 NaN NaN NaN 2015.9653 2.2985 2013.0000 2014.0000 2015.5000 2018.0000 2020.0000
days_used 3454.0000 NaN NaN NaN 674.8697 248.5802 91.0000 533.5000 690.5000 868.7500 1094.0000
normalized_used_price 3454.0000 NaN NaN NaN 4.3647 0.5889 1.5369 4.0339 4.4051 4.7557 6.6194
normalized_new_price 3454.0000 NaN NaN NaN 5.2331 0.6836 2.9014 4.7903 5.2459 5.6737 7.8478
In [11]:
df.duplicated().sum()
Out[11]:
0
In [12]:
df.nunique()
Out[12]:
brand_name                 34
os                          4
screen_size               142
4g                          2
5g                          2
main_camera_mp             41
selfie_camera_mp           37
int_memory                 15
ram                        12
battery                   324
weight                    555
release_year                8
days_used                 924
normalized_used_price    3094
normalized_new_price     2988
dtype: int64
In [13]:
df.isnull().sum()
Out[13]:
brand_name                 0
os                         0
screen_size                0
4g                         0
5g                         0
main_camera_mp           179
selfie_camera_mp           2
int_memory                 4
ram                        4
battery                    6
weight                     7
release_year               0
days_used                  0
normalized_used_price      0
normalized_new_price       0
dtype: int64
In [14]:
df.isnull().sum().sum()
Out[14]:
202
In [15]:
df.isnull().sum().sum()/df.shape[0]
Out[15]:
0.05848291835552982

NOTES:

  • Data includes 3454 rows and 15 columns.
  • 5.8% of data is missing (NaN).
In [16]:
Brands_table=pd.DataFrame({'Count':df['brand_name'].value_counts(),'Proportion':df['brand_name'].value_counts(normalize=True)})
Brands_table
Out[16]:
Count Proportion
Others 502 0.1453
Samsung 341 0.0987
Huawei 251 0.0727
LG 201 0.0582
Lenovo 171 0.0495
ZTE 140 0.0405
Xiaomi 132 0.0382
Oppo 129 0.0373
Asus 122 0.0353
Alcatel 121 0.0350
Micromax 117 0.0339
Vivo 117 0.0339
Honor 116 0.0336
HTC 110 0.0318
Nokia 106 0.0307
Motorola 106 0.0307
Sony 86 0.0249
Meizu 62 0.0180
Gionee 56 0.0162
Acer 51 0.0148
XOLO 49 0.0142
Panasonic 47 0.0136
Realme 41 0.0119
Apple 39 0.0113
Lava 36 0.0104
Celkon 33 0.0096
Spice 30 0.0087
Karbonn 29 0.0084
Coolpad 22 0.0064
BlackBerry 22 0.0064
Microsoft 22 0.0064
OnePlus 22 0.0064
Google 15 0.0043
Infinix 10 0.0029
In [17]:
df.groupby('brand_name')['main_camera_mp'].mean()
Out[17]:
brand_name
Acer          6.9676
Alcatel       6.4322
Apple         9.8205
Asus         10.0182
BlackBerry   10.3333
Celkon        3.9970
Coolpad      11.2632
Gionee        9.9134
Google       11.9333
HTC          10.7927
Honor        12.3276
Huawei       10.3404
Infinix          NaN
Karbonn       6.7552
LG            8.3450
Lava          6.8456
Lenovo        8.9518
Meizu        13.6085
Micromax      6.1047
Microsoft     9.4545
Motorola     12.9205
Nokia         5.8396
OnePlus      14.2000
Oppo         10.4445
Others        8.0130
Panasonic    10.4278
Realme       13.0000
Samsung       9.1774
Sony         14.8709
Spice         4.9217
Vivo         12.8592
XOLO          7.4551
Xiaomi       12.3991
ZTE          11.6956
Name: main_camera_mp, dtype: float64
In [18]:
df[df['brand_name']=="Infinix"]
Out[18]:
brand_name os screen_size 4g 5g main_camera_mp selfie_camera_mp int_memory ram battery weight release_year days_used normalized_used_price normalized_new_price
59 Infinix Android 17.3200 yes no NaN 8.0000 32.0000 2.0000 6000.0000 209.0000 2020 245 4.2821 4.5976
60 Infinix Android 15.3900 yes no NaN 8.0000 64.0000 4.0000 5000.0000 185.0000 2020 173 4.3636 4.7118
61 Infinix Android 15.3900 yes no NaN 8.0000 32.0000 2.0000 5000.0000 185.0000 2020 256 4.1814 4.5055
62 Infinix Android 15.3900 yes no NaN 16.0000 32.0000 3.0000 4000.0000 178.0000 2019 316 4.5552 4.6022
63 Infinix Android 15.2900 yes no NaN 16.0000 32.0000 2.0000 4000.0000 165.0000 2019 468 4.4167 4.8713
278 Infinix Android 17.3200 yes no NaN 8.0000 32.0000 2.0000 6000.0000 209.0000 2020 320 4.4051 4.6054
279 Infinix Android 15.3900 yes no NaN 8.0000 64.0000 4.0000 5000.0000 185.0000 2020 173 4.4959 4.7021
280 Infinix Android 15.3900 yes no NaN 8.0000 32.0000 2.0000 5000.0000 185.0000 2020 329 4.3707 4.4873
281 Infinix Android 15.3900 yes no NaN 16.0000 32.0000 3.0000 4000.0000 178.0000 2019 356 4.4180 4.6060
282 Infinix Android 15.2900 yes no NaN 16.0000 32.0000 2.0000 4000.0000 165.0000 2019 497 4.4233 4.8661
In [19]:
df[df['brand_name']=="Infinix"].shape
Out[19]:
(10, 15)

NOTES:

  • Missing values concentrated on variables related to camera type (181 out of the total 202 missing values) where imputation will be conducted.
  • As imputation will be done using mean value, this is calculated by brand. Resulting on the identification of a particular case for brand "Infinix" requiring freature engineering.
  • All (10) units from brand: Infinix have missing value on feature "main_camera_mp". As those units do have a selfie camera it is assumed it should also have a main camera, then, the missing information will be imputed.
  • Brands might follow different approachs to design main camera. It is assumed that is the device have a selfie camara, should also have a main camera, then NaN imputation by mean is considered for the variable of main camera by grouping the data on brand and selfie camera.
  • The feature selfie_camera_mp feature have a 2 missing values. As this feature will be used as grouping criteria to impute main_camera_mp, the first step will be imputing the feature selfie_cameramp.
In [20]:
df2=df.copy()
In [21]:
df2[(df2['selfie_camera_mp'].isnull()==True)]
Out[21]:
brand_name os screen_size 4g 5g main_camera_mp selfie_camera_mp int_memory ram battery weight release_year days_used normalized_used_price normalized_new_price
1080 Google Android 15.3200 yes no 12.2000 NaN 64.0000 4.0000 3430.0000 184.0000 2018 475 5.5738 6.8660
1081 Google Android 12.8300 yes no 12.2000 NaN 64.0000 4.0000 2915.0000 148.0000 2018 424 4.4650 6.7451
In [22]:
# Group by 'brand_name' and impute with mean
df2['selfie_camera_mp'] = df2.groupby(['brand_name'])['selfie_camera_mp'].transform(lambda x: x.fillna(x.mean()))
df2.isnull().sum()
Out[22]:
brand_name                 0
os                         0
screen_size                0
4g                         0
5g                         0
main_camera_mp           179
selfie_camera_mp           0
int_memory                 4
ram                        4
battery                    6
weight                     7
release_year               0
days_used                  0
normalized_used_price      0
normalized_new_price       0
dtype: int64

NOTES:

  • Imputed 2 missing values on selfie_camera_mp. Remains in total 200 missing values.
In [23]:
# Group by 'brand_name' and 'selfie_camera_mp' and impute with mean
# as several steps might be required, a new variable is defined for imputing main camera
df2['main_camera_mp_imp'] = df2.groupby(['brand_name','selfie_camera_mp'])['main_camera_mp'].transform(lambda x: x.fillna(x.mean()))
df2.isnull().sum()
Out[23]:
brand_name                 0
os                         0
screen_size                0
4g                         0
5g                         0
main_camera_mp           179
selfie_camera_mp           0
int_memory                 4
ram                        4
battery                    6
weight                     7
release_year               0
days_used                  0
normalized_used_price      0
normalized_new_price       0
main_camera_mp_imp        19
dtype: int64

NOTES:

  • Imputed 160 out of 179 main_camera_mp missing data, still 19 cases remains from this variable. Remains in total 40 missing values.
  • As there is no main camera reference for Infinix brand, it will be imputed considering "others" brands group, also grouped by selfie camera
In [24]:
#checking remaining missing values on main camera
df2[(df2['main_camera_mp_imp'].isnull()==True)]
Out[24]:
brand_name os screen_size 4g 5g main_camera_mp selfie_camera_mp int_memory ram battery weight release_year days_used normalized_used_price normalized_new_price main_camera_mp_imp
59 Infinix Android 17.3200 yes no NaN 8.0000 32.0000 2.0000 6000.0000 209.0000 2020 245 4.2821 4.5976 NaN
60 Infinix Android 15.3900 yes no NaN 8.0000 64.0000 4.0000 5000.0000 185.0000 2020 173 4.3636 4.7118 NaN
61 Infinix Android 15.3900 yes no NaN 8.0000 32.0000 2.0000 5000.0000 185.0000 2020 256 4.1814 4.5055 NaN
62 Infinix Android 15.3900 yes no NaN 16.0000 32.0000 3.0000 4000.0000 178.0000 2019 316 4.5552 4.6022 NaN
63 Infinix Android 15.2900 yes no NaN 16.0000 32.0000 2.0000 4000.0000 165.0000 2019 468 4.4167 4.8713 NaN
204 ZTE Android 16.8900 yes yes NaN 12.0000 256.0000 8.0000 5100.0000 215.0000 2020 235 5.3909 6.3947 NaN
205 ZTE Android 16.8900 yes yes NaN 12.0000 128.0000 6.0000 5100.0000 210.0000 2020 278 4.6521 5.7401 NaN
278 Infinix Android 17.3200 yes no NaN 8.0000 32.0000 2.0000 6000.0000 209.0000 2020 320 4.4051 4.6054 NaN
279 Infinix Android 15.3900 yes no NaN 8.0000 64.0000 4.0000 5000.0000 185.0000 2020 173 4.4959 4.7021 NaN
280 Infinix Android 15.3900 yes no NaN 8.0000 32.0000 2.0000 5000.0000 185.0000 2020 329 4.3707 4.4873 NaN
281 Infinix Android 15.3900 yes no NaN 16.0000 32.0000 3.0000 4000.0000 178.0000 2019 356 4.4180 4.6060 NaN
282 Infinix Android 15.2900 yes no NaN 16.0000 32.0000 2.0000 4000.0000 165.0000 2019 497 4.4233 4.8661 NaN
401 Coolpad Android 16.5900 yes yes NaN 16.0000 64.0000 4.0000 4000.0000 195.0000 2020 252 5.1742 5.8855 NaN
819 BlackBerry Android 15.2100 yes no NaN 16.0000 64.0000 4.0000 4000.0000 170.0000 2018 629 4.6939 5.8530 NaN
820 BlackBerry Android 15.2100 yes no NaN 16.0000 64.0000 4.0000 4000.0000 170.0000 2018 383 4.9463 5.7093 NaN
2202 Panasonic Android 15.7000 yes no NaN 16.0000 128.0000 4.0000 3000.0000 195.0000 2018 717 4.8731 5.8562 NaN
3268 Realme Android 15.3700 yes no NaN 13.0000 64.0000 4.0000 5000.0000 198.0000 2019 299 4.7008 4.9674 NaN
3409 Realme Android 15.3700 yes no NaN 13.0000 64.0000 4.0000 5000.0000 198.0000 2019 293 4.4877 4.9674 NaN
3448 Asus Android 16.7400 yes no NaN 24.0000 128.0000 8.0000 6000.0000 240.0000 2019 325 5.7153 7.0593 NaN
In [25]:
df2[(df2['main_camera_mp_imp'].isnull()==True)|df2['selfie_camera_mp'].isnull()==True].shape
Out[25]:
(19, 16)
In [26]:
df2[(df2['brand_name']=="Infinix")]['selfie_camera_mp'].unique()
Out[26]:
array([ 8., 16.])
In [27]:
df2[(df2['brand_name']=="Others")]['selfie_camera_mp'].unique()
Out[27]:
array([ 0.3, 20. ,  8. , 16. , 24. ,  5. ,  2. , 13. ,  4. ,  1. ,  2.1,
        3. ,  1.3,  1.2, 16.3])

NOTES:

  • For "Infinix" 8mp selfie camera, the main camera value will be the average of main camera from "Others" with 8mp selfie camera
  • For "Infinix" 16mp selfie camera, the main camera value will be the average of main camera from "Others" with 16mp selfie camera
  • Filter1 critera considers brand Infinix or Others (Others limited to those with selfie_camera_mp of 8 or 1)
In [28]:
filter1 = ((df2['brand_name'] == "Others") & (df2['selfie_camera_mp'].isin([8, 16]))) | (df2['brand_name'] == "Infinix")
df2.loc[filter1, 'main_camera_mp_imp'] = df2[filter1].groupby('selfie_camera_mp')['main_camera_mp_imp'].transform(lambda x: x.fillna(x.mean()))
df2.isnull().sum()
Out[28]:
brand_name                 0
os                         0
screen_size                0
4g                         0
5g                         0
main_camera_mp           179
selfie_camera_mp           0
int_memory                 4
ram                        4
battery                    6
weight                     7
release_year               0
days_used                  0
normalized_used_price      0
normalized_new_price       0
main_camera_mp_imp         9
dtype: int64

NOTES:

  • Imputed 10 out of 19 main_camera_mp missing data, still 9 cases remains from this variable. Remains in total 30 missing values.
  • Similar approach will be considered by imputing the main_camera_mp value by the average of main camera from same brand grouped by selfie camer
In [29]:
#checking remaining missing values on main camera
df2[(df2['main_camera_mp_imp'].isnull()==True)]
Out[29]:
brand_name os screen_size 4g 5g main_camera_mp selfie_camera_mp int_memory ram battery weight release_year days_used normalized_used_price normalized_new_price main_camera_mp_imp
204 ZTE Android 16.8900 yes yes NaN 12.0000 256.0000 8.0000 5100.0000 215.0000 2020 235 5.3909 6.3947 NaN
205 ZTE Android 16.8900 yes yes NaN 12.0000 128.0000 6.0000 5100.0000 210.0000 2020 278 4.6521 5.7401 NaN
401 Coolpad Android 16.5900 yes yes NaN 16.0000 64.0000 4.0000 4000.0000 195.0000 2020 252 5.1742 5.8855 NaN
819 BlackBerry Android 15.2100 yes no NaN 16.0000 64.0000 4.0000 4000.0000 170.0000 2018 629 4.6939 5.8530 NaN
820 BlackBerry Android 15.2100 yes no NaN 16.0000 64.0000 4.0000 4000.0000 170.0000 2018 383 4.9463 5.7093 NaN
2202 Panasonic Android 15.7000 yes no NaN 16.0000 128.0000 4.0000 3000.0000 195.0000 2018 717 4.8731 5.8562 NaN
3268 Realme Android 15.3700 yes no NaN 13.0000 64.0000 4.0000 5000.0000 198.0000 2019 299 4.7008 4.9674 NaN
3409 Realme Android 15.3700 yes no NaN 13.0000 64.0000 4.0000 5000.0000 198.0000 2019 293 4.4877 4.9674 NaN
3448 Asus Android 16.7400 yes no NaN 24.0000 128.0000 8.0000 6000.0000 240.0000 2019 325 5.7153 7.0593 NaN

NOTES:

  • Filter2 critera considers brands with the 9 missing values remaining on main camera
In [30]:
filter2=df2['brand_name'].isin(df2[(df2['main_camera_mp_imp'].isnull()==True)]['brand_name'].unique())
df2.loc[filter2, 'main_camera_mp_imp'] = df2[filter2].groupby('selfie_camera_mp')['main_camera_mp_imp'].transform(lambda x: x.fillna(x.mean()))
df2.isnull().sum()
Out[30]:
brand_name                 0
os                         0
screen_size                0
4g                         0
5g                         0
main_camera_mp           179
selfie_camera_mp           0
int_memory                 4
ram                        4
battery                    6
weight                     7
release_year               0
days_used                  0
normalized_used_price      0
normalized_new_price       0
main_camera_mp_imp         3
dtype: int64

NOTES:

  • Imputed 6 out of 9 main_camera_mp missing data, still 3 cases remains from this variable. Remains in total 24 missing values.
  • Imputation for those 3 cases will be done by camera average by brand (regardless of selfie camera

)

In [31]:
#checking remaining missing values on main camera
df2[(df2['main_camera_mp_imp'].isnull()==True)]
Out[31]:
brand_name os screen_size 4g 5g main_camera_mp selfie_camera_mp int_memory ram battery weight release_year days_used normalized_used_price normalized_new_price main_camera_mp_imp
204 ZTE Android 16.8900 yes yes NaN 12.0000 256.0000 8.0000 5100.0000 215.0000 2020 235 5.3909 6.3947 NaN
205 ZTE Android 16.8900 yes yes NaN 12.0000 128.0000 6.0000 5100.0000 210.0000 2020 278 4.6521 5.7401 NaN
3448 Asus Android 16.7400 yes no NaN 24.0000 128.0000 8.0000 6000.0000 240.0000 2019 325 5.7153 7.0593 NaN
In [32]:
df2[(df2['brand_name']=="ZTE")&(df2['selfie_camera_mp']==12)]
Out[32]:
brand_name os screen_size 4g 5g main_camera_mp selfie_camera_mp int_memory ram battery weight release_year days_used normalized_used_price normalized_new_price main_camera_mp_imp
204 ZTE Android 16.8900 yes yes NaN 12.0000 256.0000 8.0000 5100.0000 215.0000 2020 235 5.3909 6.3947 NaN
205 ZTE Android 16.8900 yes yes NaN 12.0000 128.0000 6.0000 5100.0000 210.0000 2020 278 4.6521 5.7401 NaN
In [33]:
df2[(df2['brand_name']=="Asus")&(df2['selfie_camera_mp']==24)]
Out[33]:
brand_name os screen_size 4g 5g main_camera_mp selfie_camera_mp int_memory ram battery weight release_year days_used normalized_used_price normalized_new_price main_camera_mp_imp
3448 Asus Android 16.7400 yes no NaN 24.0000 128.0000 8.0000 6000.0000 240.0000 2019 325 5.7153 7.0593 NaN

NOTES:

  • Filter3 critera considers brands of the 3 missing values remaining on main_camera_mp
In [34]:
filter3=(df2['brand_name']=="ZTE")|(df2['brand_name']=="Asus")
df2.loc[filter3, 'main_camera_mp_imp']=df2[filter3].groupby('brand_name')['main_camera_mp_imp'].transform(lambda x: x.fillna(x.mean()))
df2.isnull().sum()
Out[34]:
brand_name                 0
os                         0
screen_size                0
4g                         0
5g                         0
main_camera_mp           179
selfie_camera_mp           0
int_memory                 4
ram                        4
battery                    6
weight                     7
release_year               0
days_used                  0
normalized_used_price      0
normalized_new_price       0
main_camera_mp_imp         0
dtype: int64
In [35]:
df2.drop('main_camera_mp', axis=1, inplace=True)
In [36]:
df2.columns
Out[36]:
Index(['brand_name', 'os', 'screen_size', '4g', '5g', 'selfie_camera_mp',
       'int_memory', 'ram', 'battery', 'weight', 'release_year', 'days_used',
       'normalized_used_price', 'normalized_new_price', 'main_camera_mp_imp'],
      dtype='object')
In [37]:
df2.isnull().sum()
Out[37]:
brand_name               0
os                       0
screen_size              0
4g                       0
5g                       0
selfie_camera_mp         0
int_memory               4
ram                      4
battery                  6
weight                   7
release_year             0
days_used                0
normalized_used_price    0
normalized_new_price     0
main_camera_mp_imp       0
dtype: int64

NOTES:

  • Imputed last 3 main_camera_mp missing values. Remains in total 21 missing values.
  • Check missing values on remaining variables "int_memory"(4), "ram"(4), "battery"(6), and "weight"(7)
  • The imputation will be done with mean, grouping by bran.

d

In [38]:
df2[df2['int_memory'].isnull()==True]
Out[38]:
brand_name os screen_size 4g 5g selfie_camera_mp int_memory ram battery weight release_year days_used normalized_used_price normalized_new_price main_camera_mp_imp
117 Nokia Others 5.1800 yes no 0.0000 NaN 0.0200 1200.0000 86.5000 2019 234 2.7213 3.6884 0.3000
2035 Nokia Others 5.1800 no no 0.0000 NaN 0.0300 1020.0000 157.0000 2019 501 2.3437 3.4203 5.0000
2064 Nokia Others 5.1800 no no 0.0000 NaN 0.0200 1100.0000 78.4000 2015 559 2.5870 3.3786 0.3000
2092 Nokia Others 7.6200 no no 0.0000 NaN 0.0200 1010.0000 100.0000 2013 1043 3.5357 4.3706 5.0000
In [39]:
df2[df2['ram'].isnull()==True]
Out[39]:
brand_name os screen_size 4g 5g selfie_camera_mp int_memory ram battery weight release_year days_used normalized_used_price normalized_new_price main_camera_mp_imp
114 Nokia Others 5.1800 no no 0.0000 0.0600 NaN 1020.0000 91.3000 2020 288 2.7292 2.9113 0.3000
335 Nokia Others 5.1800 no no 0.0000 0.1000 NaN 1200.0000 88.2000 2020 327 3.0629 3.6891 0.3000
2059 Nokia Others 5.1800 no no 0.0000 0.0600 NaN NaN 82.6000 2016 1023 2.7651 3.6579 0.3000
2090 Nokia Others 7.6200 no no 0.0000 0.0600 NaN 1200.0000 111.4000 2013 1001 3.8278 4.6058 5.0000
In [40]:
df2[df2['battery'].isnull()==True]
Out[40]:
brand_name os screen_size 4g 5g selfie_camera_mp int_memory ram battery weight release_year days_used normalized_used_price normalized_new_price main_camera_mp_imp
1829 Meizu Android 12.8300 yes no 5.0000 16.0000 4.0000 NaN 145.0000 2014 986 4.1779 4.8636 13.0000
1831 Meizu Android 12.8300 yes no 5.0000 16.0000 4.0000 NaN 158.0000 2014 1043 4.8789 5.9906 20.7000
1832 Meizu Android 13.6100 yes no 2.0000 16.0000 4.0000 NaN 147.0000 2014 1007 4.7423 5.8261 20.7000
1962 Microsoft Windows 25.5500 no no 3.5000 32.0000 4.0000 NaN 675.9000 2013 931 5.2306 5.8028 5.0000
2058 Nokia Others 5.1800 no no 0.0000 0.0600 0.0200 NaN 81.0000 2016 815 2.7187 3.3745 0.3000
2059 Nokia Others 5.1800 no no 0.0000 0.0600 NaN NaN 82.6000 2016 1023 2.7651 3.6579 0.3000
In [41]:
df2[df2['weight'].isnull()==True]
Out[41]:
brand_name os screen_size 4g 5g selfie_camera_mp int_memory ram battery weight release_year days_used normalized_used_price normalized_new_price main_camera_mp_imp
3002 XOLO Android 12.7000 yes no 5.0000 32.0000 4.0000 2400.0000 NaN 2015 576 4.1659 4.9304 13.0000
3003 XOLO Android 12.8300 yes no 5.0000 16.0000 4.0000 3200.0000 NaN 2015 800 4.2821 5.1892 8.0000
3004 XOLO Android 12.7000 no no 2.0000 32.0000 4.0000 2100.0000 NaN 2015 878 3.8797 4.0811 8.0000
3005 XOLO Android 10.2900 no no 0.3000 32.0000 4.0000 1800.0000 NaN 2015 1036 3.8238 4.3961 5.0000
3006 XOLO Android 12.7000 no no 0.3000 16.0000 4.0000 2500.0000 NaN 2015 679 3.8371 4.3472 5.0000
3007 XOLO Windows 12.7000 no no 2.0000 32.0000 4.0000 2200.0000 NaN 2015 838 3.7072 4.7917 8.0000
3008 XOLO Android 12.7000 no no 5.0000 32.0000 4.0000 2500.0000 NaN 2015 1045 4.1846 4.7854 8.0000
In [42]:
# Group by 'brand_name' and impute with mean
df2['int_memory'] = df2.groupby(['brand_name'])['int_memory'].transform(lambda x: x.fillna(x.mean()))
df2.isnull().sum()
Out[42]:
brand_name               0
os                       0
screen_size              0
4g                       0
5g                       0
selfie_camera_mp         0
int_memory               0
ram                      4
battery                  6
weight                   7
release_year             0
days_used                0
normalized_used_price    0
normalized_new_price     0
main_camera_mp_imp       0
dtype: int64
In [43]:
# Group by 'brand_name' and impute with mean
df2['ram'] = df2.groupby(['brand_name'])['ram'].transform(lambda x: x.fillna(x.mean()))
df2.isnull().sum()
Out[43]:
brand_name               0
os                       0
screen_size              0
4g                       0
5g                       0
selfie_camera_mp         0
int_memory               0
ram                      0
battery                  6
weight                   7
release_year             0
days_used                0
normalized_used_price    0
normalized_new_price     0
main_camera_mp_imp       0
dtype: int64
In [44]:
# Group by 'brand_name' and impute with mean
df2['battery'] = df2.groupby(['brand_name'])['battery'].transform(lambda x: x.fillna(x.mean()))
df2.isnull().sum()
Out[44]:
brand_name               0
os                       0
screen_size              0
4g                       0
5g                       0
selfie_camera_mp         0
int_memory               0
ram                      0
battery                  0
weight                   7
release_year             0
days_used                0
normalized_used_price    0
normalized_new_price     0
main_camera_mp_imp       0
dtype: int64
In [45]:
# Group by 'brand_name' and impute with mean
df2['weight'] = df2.groupby(['brand_name'])['weight'].transform(lambda x: x.fillna(x.mean()))
df2.isnull().sum()
Out[45]:
brand_name               0
os                       0
screen_size              0
4g                       0
5g                       0
selfie_camera_mp         0
int_memory               0
ram                      0
battery                  0
weight                   0
release_year             0
days_used                0
normalized_used_price    0
normalized_new_price     0
main_camera_mp_imp       0
dtype: int64

NOTES:

  • All (202) missing values imputed, using central tendency measures (mean) of a column grouped by categories where the data under similar categories are likely to have similar properties.
In [46]:
df2.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3454 entries, 0 to 3453
Data columns (total 15 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   brand_name             3454 non-null   object 
 1   os                     3454 non-null   object 
 2   screen_size            3454 non-null   float64
 3   4g                     3454 non-null   object 
 4   5g                     3454 non-null   object 
 5   selfie_camera_mp       3454 non-null   float64
 6   int_memory             3454 non-null   float64
 7   ram                    3454 non-null   float64
 8   battery                3454 non-null   float64
 9   weight                 3454 non-null   float64
 10  release_year           3454 non-null   int64  
 11  days_used              3454 non-null   int64  
 12  normalized_used_price  3454 non-null   float64
 13  normalized_new_price   3454 non-null   float64
 14  main_camera_mp_imp     3454 non-null   float64
dtypes: float64(9), int64(2), object(4)
memory usage: 404.9+ KB
In [47]:
df2.nunique()
Out[47]:
brand_name                 34
os                          4
screen_size               142
4g                          2
5g                          2
selfie_camera_mp           38
int_memory                 16
ram                        13
battery                   327
weight                    556
release_year                8
days_used                 924
normalized_used_price    3094
normalized_new_price     2988
main_camera_mp_imp         66
dtype: int64
In [48]:
df2.describe().T
Out[48]:
count mean std min 25% 50% 75% max
screen_size 3454.0000 13.7131 3.8053 5.0800 12.7000 12.8300 15.3400 30.7100
selfie_camera_mp 3454.0000 6.5548 6.9684 0.0000 2.0000 5.0000 8.0000 32.0000
int_memory 3454.0000 54.5395 84.9289 0.0100 16.0000 32.0000 64.0000 1024.0000
ram 3454.0000 4.0343 1.3654 0.0200 4.0000 4.0000 4.0000 12.0000
battery 3454.0000 3132.8833 1298.8047 500.0000 2100.0000 3000.0000 4000.0000 9720.0000
weight 3454.0000 182.6876 88.3351 69.0000 142.0000 160.0000 185.0000 855.0000
release_year 3454.0000 2015.9653 2.2985 2013.0000 2014.0000 2015.5000 2018.0000 2020.0000
days_used 3454.0000 674.8697 248.5802 91.0000 533.5000 690.5000 868.7500 1094.0000
normalized_used_price 3454.0000 4.3647 0.5889 1.5369 4.0339 4.4051 4.7557 6.6194
normalized_new_price 3454.0000 5.2331 0.6836 2.9014 4.7903 5.2459 5.6737 7.8478
main_camera_mp_imp 3454.0000 9.6458 4.7918 0.0800 5.0000 8.0000 13.0000 48.0000
In [49]:
df.describe(include="O").T
Out[49]:
count unique top freq
brand_name 3454 34 Others 502
os 3454 4 Android 3214
4g 3454 2 yes 2335
5g 3454 2 no 3302
In [50]:
num_cols = df.select_dtypes(include=np.number).columns.tolist()
cat_cols = df2.select_dtypes(include=['object', 'category']).columns.tolist()
for column in cat_cols:
    print(df2[column].value_counts())
    print("-" * 50)
Others        502
Samsung       341
Huawei        251
LG            201
Lenovo        171
ZTE           140
Xiaomi        132
Oppo          129
Asus          122
Alcatel       121
Micromax      117
Vivo          117
Honor         116
HTC           110
Nokia         106
Motorola      106
Sony           86
Meizu          62
Gionee         56
Acer           51
XOLO           49
Panasonic      47
Realme         41
Apple          39
Lava           36
Celkon         33
Spice          30
Karbonn        29
Coolpad        22
BlackBerry     22
Microsoft      22
OnePlus        22
Google         15
Infinix        10
Name: brand_name, dtype: int64
--------------------------------------------------
Android    3214
Others      137
Windows      67
iOS          36
Name: os, dtype: int64
--------------------------------------------------
yes    2335
no     1119
Name: 4g, dtype: int64
--------------------------------------------------
no     3302
yes     152
Name: 5g, dtype: int64
--------------------------------------------------

NOTES:

  • No issues on variable's data type.
  • After missing value treatment, no missing value remains
  • droped the variable main_camera_mp (with NaN) and replaced with main_camera_mp_imp (imputed, without NaN).
  • There are more than 34 brands represented in the sample, with one category labeled as "Others" as the most frequent category
  • There are 4 operative sistem, being the most popular Android
  • There are more 4G capable devices than 5G

Consolidated Notes from Data Overview¶

Observations

  • The goal is to predict the price of a used device and identify influent factors.
  • Data includes 3454 rows and 15 columns.
  • 5.8% of data is missing (NaN).

Missing value treatment

  • Missing values concentrated on variables related to camera type (181 out of the total 202 missing values) where imputation will be conducted.
  • As imputation will be done using mean value, this is calculated by brand. Resulting on the identification of a particular case for brand "Infinix" requiring freature engineering.
  • All (10) units from brand: Infinix have missing value on feature "main_camera_mp". As those units do have a selfie camera it is assumed it should also have a main camera, then, the missing information will be imputed.
  • Brands might follow different approachs to design main camera. It is assumed that is the device have a selfie camara, should also have a main camera, then NaN imputation by mean is considered for the variable of main camera by grouping the data on brand and selfie camera.
  • The feature selfie_camera_mp feature have a 2 missing values. As this feature will be used as grouping criteria to impute main_camera_mp, the first step will be imputing the feature selfie_camera_mp.
  • Imputed 2 missing values on selfie_camera_mp. Remains in total 200 missing values.
  • Imputed 160 out of 179 main_camera_mp missing data, still 19 cases remains from this variable. Remains in total 40 missing values.
  • As there is no main camera reference for Infinix brand, it will be imputed considering "others" brands group, also grouped by selfie camera.
  • For "Infinix" 8mp selfie camera, the main camera value will be the average of main camera from "Others" with 8mp selfie camera
  • For "Infinix" 16mp selfie camera, the main camera value will be the average of main camera from "Others" with 16mp selfie camera
  • Filter1 critera considers brand Infinix or Others (Others limited to those with selfie_camera_mp of 8 or 16)
  • Imputed 10 out of 19 main_camera_mp missing data, still 9 cases remains from this variable. Remains in total 30 missing values.
  • Similar approach will be considered by imputing the main_camera_mp value by the average of main camera from same brand grouped by selfie camera
  • Filter2 critera considers brands with the 9 missing values remaining on main camera
  • Imputed 6 out of 9 main_camera_mp missing data, still 3 cases remains from this variable. Remains in total 24 missing values.
  • Imputation for those 3 cases will be done by camera average by brand (regardless of selfie camera)
  • Filter3 critera considers brands of the 3 missing values remaining on main_camera_mp
  • Imputed last 3 main_camera_mp missing values. Remains in total 21 missing values.
  • Check missing values on remaining variables "int_memory"(4), "ram"(4), "battery"(6), and "weight"(7)
  • The imputation will be done with mean, grouping by brand.
  • All (202) missing values imputed, using central tendency measures (mean) of a column grouped by categories where the data under similar categories are likely to have similar properties.

Sanity checks

  • No issues on variable's data type.
  • After missing value treatment, no missing value remains
  • droped the variable main_camera_mp (with NaN) and replaced with main_camera_mp_imp (imputed, without NaN).
  • There are more than 34 brands represented in the sample, with one category labeled as "Others" as the most frequent category
  • There are 4 operative sistem, being the most popular Android
  • There are more 4G capable devices than 5G

Exploratory Data Analysis (EDA)¶

  • EDA is an important part of any project involving data.
  • It is important to investigate and understand the data better before building a model with it.
  • A few questions have been mentioned below which will help you approach the analysis in the right manner and generate insights from the data.
  • A thorough analysis of the data, in addition to the questions mentioned below, should be done.
In [51]:
plt.figure(figsize=(15, 10))
for i, column in enumerate(df2.select_dtypes(include=['object', 'category']).columns, 1):
    plt.subplot(2, 2, i)
    sns.countplot(x=df2[column])
    plt.title(column)
    plt.xticks(rotation=90)
plt.tight_layout()
plt.show()
No description has been provided for this image

NOTES:

  • Brand_name show 34 brands with very ood count number.
  • The variable "Others" is assumed include any number of brands not individualized, is the category with biggest count.
  • os variable show clearly Android is the most common operative system
  • 4G capable devices are around twice the non capable
  • There are very few 5G capable devices
  • There is a lot of outliers in many variables. Detailed analysis required.
In [52]:
plt.figure(figsize=(15, 10))
for i, column in enumerate(df2.select_dtypes(include=[np.number]).columns, 1):
    plt.subplot(3, 4, i)
    sns.boxplot(x=df2[column])
    plt.title(column)
plt.tight_layout()
plt.show()
No description has been provided for this image
In [53]:
plt.figure(figsize=(15, 10))
for i, column in enumerate(df2.select_dtypes(include=[np.number]).columns, 1):
    plt.subplot(3, 4, i)
    sns.histplot(x=df2[column],kde=True)
    plt.title(column)
plt.tight_layout()
plt.show()
No description has been provided for this image

NOTES:

  • normalized_used_price and normalized_new_price have a normal distribution as expected by variable naming
  • None of the others numerical variables have normal distribution, presenting skewing and multimode distributions.
In [54]:
sns.pairplot(df2.select_dtypes(include=[np.number]),diag_kind="kde");
No description has been provided for this image
In [55]:
df2[df2.select_dtypes(include=np.number).columns.tolist()].corr()
Out[55]:
screen_size selfie_camera_mp int_memory ram battery weight release_year days_used normalized_used_price normalized_new_price main_camera_mp_imp
screen_size 1.0000 0.2716 0.0719 0.2764 0.8114 0.8289 0.3642 -0.2917 0.6148 0.4609 0.1668
selfie_camera_mp 0.2716 1.0000 0.2966 0.4781 0.3699 -0.0046 0.6909 -0.5526 0.6078 0.4749 0.4229
int_memory 0.0719 0.2966 1.0000 0.1238 0.1184 0.0155 0.2351 -0.2423 0.1912 0.1963 0.0420
ram 0.2764 0.4781 0.1238 1.0000 0.2822 0.0912 0.3128 -0.2794 0.5212 0.5329 0.2631
battery 0.8114 0.3699 0.1184 0.2822 1.0000 0.6992 0.4884 -0.3707 0.6127 0.4705 0.2701
weight 0.8289 -0.0046 0.0155 0.0912 0.6992 1.0000 0.0716 -0.0679 0.3826 0.2699 -0.0828
release_year 0.3642 0.6909 0.2351 0.3128 0.4884 0.0716 1.0000 -0.7504 0.5098 0.3037 0.3797
days_used -0.2917 -0.5526 -0.2423 -0.2794 -0.3707 -0.0679 -0.7504 1.0000 -0.3583 -0.2166 -0.1889
normalized_used_price 0.6148 0.6078 0.1912 0.5212 0.6127 0.3826 0.5098 -0.3583 1.0000 0.8345 0.5866
normalized_new_price 0.4609 0.4749 0.1963 0.5329 0.4705 0.2699 0.3037 -0.2166 0.8345 1.0000 0.5368
main_camera_mp_imp 0.1668 0.4229 0.0420 0.2631 0.2701 -0.0828 0.3797 -0.1889 0.5866 0.5368 1.0000

NOTES:

  • normalized_used_price and normalized_new_price have high correlation (0.83)
  • screen_size have high correlation with battery(0.81) and weight (0.83)
In [56]:
pd.crosstab(df2['4g'],df2['5g'],margins="all")
Out[56]:
5g no yes All
4g
no 1119 0 1119
yes 2183 152 2335
All 3302 152 3454
In [57]:
#tech=pd.crosstab(df['4g'],df['5g'],normalize='index')
#tech
pd.crosstab(df2['4g'],df2['5g'],normalize='index').plot.bar(stacked=True);
No description has been provided for this image

NOTES:

  • Only 152 devices are both 4G and 5G compatible
  • There are 1119 devices of other technologies not specified (i.e. 3G, 2G)
  • 2183 devices are only 4G

Consolidated Notes from Exploratory Data Analysis¶

Univariate Analysis. Categorical variables

  • Brand_name show 34 brands with very ood count number.
  • The variable "Others" is assumed include any number of brands not individualized, is the category with biggest count.
  • os variable show clearly Android is the most common operative system
  • 4G capable devices are around twice the non capable
  • There are very few 5G capable devices
  • There is a lot of outliers in many variables. Detailed analysis required.

Univariate Analysis. Numerical variables

  • normalized_used_price and normalized_new_price have a normal distribution as expected by variable naming
  • None of the others numerical variables have normal distribution, presenting skewing and multimode distributions.

Bivariate Analysis. Categorical variables

  • Only 152 devices are both 4G and 5G compatible
  • There are 1119 devices of other technologies not specified (i.e. 3G, 2G)
  • 2183 devices are only 4G

Bivariate Analysis. Numerical variables

  • normalized_used_price and normalized_new_price have high correlation (0.83)
  • screen_size have high correlation with battery(0.81) and weight (0.83)

Questions:

  1. What does the distribution of normalized used device prices look like?
  2. What percentage of the used device market is dominated by Android devices?
  3. The amount of RAM is important for the smooth functioning of a device. How does the amount of RAM vary with the brand?
  4. A large battery often increases a device's weight, making it feel uncomfortable in the hands. How does the weight vary for phones and tablets offering large batteries (more than 4500 mAh)?
  5. Bigger screens are desirable for entertainment purposes as they offer a better viewing experience. How many phones and tablets are available across different brands with a screen size larger than 6 inches?
  6. A lot of devices nowadays offer great selfie cameras, allowing us to capture our favorite moments with loved ones. What is the distribution of devices offering greater than 8MP selfie cameras across brands?
  7. Which attributes are highly correlated with the normalized price of a used device?

Answers¶

  1. The distribution of normalized used device prices looks similar to a normal distribution, left skewed.
  2. Andoid os represents 93% of used device market.
  3. The amount of RAM per device varies from 0 to 12, whith mean of 4Mb and std of 1.37. This distribution is quite similar across all device brands with some exceptions.
  4. The weight of devices with large batteries have a multimode, right skewed distribution, with values between 118 and 855 grams, with a mean of 332 grams
  5. 31.8% of devices have a screen size larger than 6 inches (1099 out of 3454)
  6. There are 655 devices with large selfie cameras, distrbuted on 25 brands, being the most popular brand "Huawei" with 87 units. The distribution is multimodal.
  7. normalized_new_price (0.83) is the attribute with higher correlation with normalized_used_price, followed by screen_size, battery and selfie_camera_mp (0.61) and main_camera_mp_imp (0.59).
In [58]:
#1. What does the distribution of normalized used device prices look like?
sns.displot(data=df2, x="normalized_used_price", kde=True);
No description has been provided for this image

Answers:

  1. The distribution of normalized used device prices looks similar to a normal distribution, left skewed.
In [59]:
# 2.What percentage of the used device market is dominated by Android devices?
df2['os'].value_counts(normalize=True)
Out[59]:
Android   0.9305
Others    0.0397
Windows   0.0194
iOS       0.0104
Name: os, dtype: float64

Answers:

  1. Andoid os represents 93% of used device market.
In [60]:
#3. The amount of RAM is important for the smooth functioning of a device. How does the amount of RAM vary with the brand?
df2['ram'].describe()
Out[60]:
count   3454.0000
mean       4.0343
std        1.3654
min        0.0200
25%        4.0000
50%        4.0000
75%        4.0000
max       12.0000
Name: ram, dtype: float64
In [61]:
#3. The amount of RAM is important for the smooth functioning of a device. How does the amount of RAM vary with the brand?
cum_ram=df2.groupby('brand_name')['ram'].describe().sort_values(by='count', ascending=False)
cum_ram['cum_count']=cum_ram['count'].cumsum()
cum_ram['cum_pct']=cum_ram['cum_count']/cum_ram['count'].sum()
cum_ram
Out[61]:
count mean std min 25% 50% 75% max cum_count cum_pct
brand_name
Others 502.0000 3.7779 1.0158 0.2500 4.0000 4.0000 4.0000 8.0000 502.0000 0.1453
Samsung 341.0000 4.1994 1.3771 0.2500 4.0000 4.0000 4.0000 12.0000 843.0000 0.2441
Huawei 251.0000 4.6554 1.5954 0.2500 4.0000 4.0000 4.0000 12.0000 1094.0000 0.3167
LG 201.0000 3.9366 1.0765 0.2500 4.0000 4.0000 4.0000 8.0000 1295.0000 0.3749
Lenovo 171.0000 3.8860 0.7742 0.2500 4.0000 4.0000 4.0000 6.0000 1466.0000 0.4244
ZTE 140.0000 4.0232 0.9095 0.2500 4.0000 4.0000 4.0000 8.0000 1606.0000 0.4650
Xiaomi 132.0000 4.5833 1.5085 2.0000 4.0000 4.0000 4.0000 12.0000 1738.0000 0.5032
Oppo 129.0000 4.9612 2.1228 1.0000 4.0000 4.0000 6.0000 12.0000 1867.0000 0.5405
Asus 122.0000 4.0492 0.6010 2.0000 4.0000 4.0000 4.0000 8.0000 1989.0000 0.5759
Alcatel 121.0000 3.4070 1.2637 0.2500 4.0000 4.0000 4.0000 4.0000 2110.0000 0.6109
Vivo 117.0000 4.7564 1.6382 0.5000 4.0000 4.0000 4.0000 8.0000 2227.0000 0.6448
Micromax 117.0000 3.6795 1.0529 0.2500 4.0000 4.0000 4.0000 4.0000 2344.0000 0.6786
Honor 116.0000 4.6034 1.6252 2.0000 4.0000 4.0000 6.0000 8.0000 2460.0000 0.7122
HTC 110.0000 4.0000 0.3318 3.0000 4.0000 4.0000 4.0000 6.0000 2570.0000 0.7441
Nokia 106.0000 2.4203 1.8530 0.0200 0.0300 3.5000 4.0000 6.0000 2676.0000 0.7748
Motorola 106.0000 3.9434 1.3297 2.0000 4.0000 4.0000 4.0000 12.0000 2782.0000 0.8054
Sony 86.0000 4.0698 0.4800 4.0000 4.0000 4.0000 4.0000 8.0000 2868.0000 0.8303
Meizu 62.0000 4.4516 1.2238 2.0000 4.0000 4.0000 4.0000 8.0000 2930.0000 0.8483
Gionee 56.0000 3.9330 0.5011 0.2500 4.0000 4.0000 4.0000 4.0000 2986.0000 0.8645
Acer 51.0000 3.9020 0.5002 1.0000 4.0000 4.0000 4.0000 4.0000 3037.0000 0.8793
XOLO 49.0000 4.0000 0.0000 4.0000 4.0000 4.0000 4.0000 4.0000 3086.0000 0.8935
Panasonic 47.0000 4.0000 0.0000 4.0000 4.0000 4.0000 4.0000 4.0000 3133.0000 0.9071
Realme 41.0000 4.1951 1.3270 2.0000 3.0000 4.0000 6.0000 6.0000 3174.0000 0.9189
Apple 39.0000 4.0000 0.6070 2.0000 4.0000 4.0000 4.0000 6.0000 3213.0000 0.9302
Lava 36.0000 3.2778 1.4139 0.2500 4.0000 4.0000 4.0000 4.0000 3249.0000 0.9406
Celkon 33.0000 1.6136 1.8319 0.2500 0.2500 0.2500 4.0000 4.0000 3282.0000 0.9502
Spice 30.0000 3.7500 0.9514 0.2500 4.0000 4.0000 4.0000 4.0000 3312.0000 0.9589
Karbonn 29.0000 3.3534 1.4416 0.2500 4.0000 4.0000 4.0000 4.0000 3341.0000 0.9673
Microsoft 22.0000 4.0000 0.0000 4.0000 4.0000 4.0000 4.0000 4.0000 3363.0000 0.9737
OnePlus 22.0000 6.3636 2.5920 4.0000 4.0000 6.0000 8.0000 12.0000 3385.0000 0.9800
Coolpad 22.0000 3.9545 0.2132 3.0000 4.0000 4.0000 4.0000 4.0000 3407.0000 0.9864
BlackBerry 22.0000 3.8295 0.7995 0.2500 4.0000 4.0000 4.0000 4.0000 3429.0000 0.9928
Google 15.0000 4.5333 0.9155 4.0000 4.0000 4.0000 5.0000 6.0000 3444.0000 0.9971
Infinix 10.0000 2.6000 0.8433 2.0000 2.0000 2.0000 3.0000 4.0000 3454.0000 1.0000
In [62]:
sns.displot(df2, x="ram", kde=True);
No description has been provided for this image
In [63]:
sns.displot(
    data=df2, 
    x="ram", 
    col="brand_name", 
    kde=True, 
    col_wrap=6, 
    height=2, 
    aspect=1
);
No description has been provided for this image
In [64]:
sns.catplot(data=df2, y="ram", hue="brand_name", kind="box", col="brand_name", col_wrap=4);
No description has been provided for this image
In [65]:
sns.catplot(data=df2, y="ram", hue="brand_name", kind="box", col="brand_name", showfliers=False, col_wrap=4);
No description has been provided for this image

Answers 3. The amount of RAM per device varies from 0 to 12, whith mean of 4Mb and std of 1.37. This distribution is quite similar across all device brands with some exceptions.

In [66]:
# 4. A large battery often increases a device's weight, making it feel uncomfortable in the hands. 
# How does the weight vary for phones and tablets offering large batteries (more than 4500 mAh)?
df2[df2['battery']>4500]['weight'].describe()
Out[66]:
count   341.0000
mean    332.2757
std     155.5018
min     118.0000
25%     198.0000
50%     300.0000
75%     467.0000
max     855.0000
Name: weight, dtype: float64
In [67]:
sns.displot(data=df2[df2['battery']>4500], x='weight', kde=True);
No description has been provided for this image
In [68]:
sns.boxplot(data=df2[df2['battery']>4500], x='weight');
No description has been provided for this image

Answers:

  1. The weight of devices with large batteries have a multimode, right skewed distribution, with values between 118 and 855 grams, with a mean of 332 grams
In [69]:
# 5.Bigger screens are desirable for entertainment purposes as they offer a better viewing experience. 
# How many phones and tablets are available across different brands with a screen size larger than 6 inches?
pd.DataFrame({'count_all':df2['brand_name'].value_counts(),'count_large':df2[df2['screen_size']>6*2.54]['brand_name'].value_counts()}).sort_values(by='count_large', ascending=False)
Out[69]:
count_all count_large
Huawei 251 149.0000
Samsung 341 119.0000
Others 502 99.0000
Vivo 117 80.0000
Honor 116 72.0000
Oppo 129 70.0000
Xiaomi 132 69.0000
Lenovo 171 69.0000
LG 201 59.0000
Motorola 106 42.0000
Asus 122 41.0000
Realme 41 40.0000
Alcatel 121 26.0000
Apple 39 24.0000
Acer 51 19.0000
ZTE 140 17.0000
Meizu 62 17.0000
OnePlus 22 16.0000
Nokia 106 15.0000
Sony 86 12.0000
Infinix 10 10.0000
HTC 110 7.0000
Micromax 117 7.0000
Google 15 4.0000
Gionee 56 3.0000
XOLO 49 3.0000
Coolpad 22 3.0000
Karbonn 29 2.0000
Panasonic 47 2.0000
Spice 30 2.0000
Microsoft 22 1.0000
BlackBerry 22 NaN
Celkon 33 NaN
Lava 36 NaN
In [70]:
df2[df2['screen_size']>6*2.54]['brand_name'].value_counts().sum()
Out[70]:
1099
In [71]:
df2.shape[0]
Out[71]:
3454
In [72]:
df2[df2['screen_size']>6*2.54]['brand_name'].value_counts().sum()/df2.shape[0]
Out[72]:
0.3181818181818182

Answers:

  1. 31.8% of devices have a screen size larger than 6 inches (1099 out of 3454)
In [73]:
#6. A lot of devices nowadays offer great selfie cameras, allowing us to capture our favorite moments with loved ones. 
# What is the distribution of devices offering greater than 8MP selfie cameras across brands?
df2[df2['selfie_camera_mp']>8]['brand_name'].value_counts()
Out[73]:
Huawei        87
Vivo          78
Oppo          75
Xiaomi        63
Samsung       57
Honor         41
Others        34
LG            32
Motorola      26
Meizu         24
HTC           20
ZTE           20
Realme        18
OnePlus       18
Lenovo        14
Sony          14
Nokia         10
Asus           6
Infinix        4
Gionee         4
Coolpad        3
BlackBerry     2
Micromax       2
Panasonic      2
Acer           1
Name: brand_name, dtype: int64
In [74]:
df2[df2['selfie_camera_mp']>8]['brand_name'].describe()
Out[74]:
count        655
unique        25
top       Huawei
freq          87
Name: brand_name, dtype: object
In [75]:
sns.displot(data=df2[df2['selfie_camera_mp']>8], x='brand_name', kde=True);
plt.xticks(rotation=90)
plt.show()
No description has been provided for this image

Answers:

  1. There are 655 devices with large selfie cameras, distrbuted on 25 brands, being the most popular brand "Huawei" with 87 units. The distribution is multimodal.
In [76]:
# 7. Which attributes are highly correlated with the normalized price of a used device?
df2[df2.select_dtypes(include=np.number).columns.tolist()].corr()['normalized_used_price'].sort_values(ascending=False)
Out[76]:
normalized_used_price    1.0000
normalized_new_price     0.8345
screen_size              0.6148
battery                  0.6127
selfie_camera_mp         0.6078
main_camera_mp_imp       0.5866
ram                      0.5212
release_year             0.5098
weight                   0.3826
int_memory               0.1912
days_used               -0.3583
Name: normalized_used_price, dtype: float64

Answers:

  1. normalized_new_price (0.83) is the attribute with higher correlation with normalized_used_price, followed by screen_size, battery and selfie_camera_mp (0.61) and main_camera_mp_imp (0.59).

Data Preprocessing¶

  • Outlier detection and treatment (if needed)
  • Feature engineering
  • Preparing data for modeling
  • Any other preprocessing steps (if needed)

NOTES:

  • From visualization it was noticed there is a lot of outliers in many variables. Detailed analysis required.
In [77]:
for column in df2.select_dtypes(include=np.number).columns:
    outliers=len(df2[(df2[column] < df2[column].quantile(0.25)-1.5*(df2[column].quantile(0.75)-df2[column].quantile(0.25))) | (df2[column] > df2[column].quantile(0.75)+1.5*(df2[column].quantile(0.75)-df2[column].quantile(0.25)))][column])
    print(f'{column}: {outliers} outliers')
screen_size: 450 outliers
selfie_camera_mp: 221 outliers
int_memory: 138 outliers
ram: 639 outliers
battery: 77 outliers
weight: 368 outliers
release_year: 0 outliers
days_used: 0 outliers
normalized_used_price: 85 outliers
normalized_new_price: 66 outliers
main_camera_mp_imp: 5 outliers

NOTES:

  • Quantified outliers is noticed only two variables have no outliers, while the rest have up to 639 (18% of samples).
In [78]:
df3=df2[(np.abs(df2.select_dtypes(include=np.number).apply(zscore))<3).all(axis=1)]
df3
Out[78]:
brand_name os screen_size 4g 5g selfie_camera_mp int_memory ram battery weight release_year days_used normalized_used_price normalized_new_price main_camera_mp_imp
0 Honor Android 14.5000 yes no 5.0000 64.0000 3.0000 3020.0000 146.0000 2020 127 4.3076 4.7151 13.0000
1 Honor Android 17.3000 yes yes 16.0000 128.0000 8.0000 4300.0000 213.0000 2020 325 5.1621 5.5190 13.0000
2 Honor Android 16.6900 yes yes 8.0000 128.0000 8.0000 4200.0000 213.0000 2020 162 5.1111 5.8846 13.0000
4 Honor Android 15.3200 yes no 8.0000 64.0000 3.0000 5000.0000 185.0000 2020 293 4.3900 4.9478 13.0000
5 Honor Android 16.2300 yes no 8.0000 64.0000 4.0000 4000.0000 176.0000 2020 223 4.4139 5.0607 13.0000
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3449 Asus Android 15.3400 yes no 8.0000 64.0000 6.0000 5000.0000 190.0000 2019 232 4.4923 6.4839 14.7778
3450 Asus Android 15.2400 yes no 8.0000 128.0000 8.0000 4000.0000 200.0000 2018 541 5.0377 6.2515 13.0000
3451 Alcatel Android 15.8000 yes no 5.0000 32.0000 3.0000 4000.0000 165.0000 2020 201 4.3573 4.5288 13.0000
3452 Alcatel Android 15.8000 yes no 5.0000 32.0000 2.0000 4000.0000 160.0000 2020 149 4.3498 4.6242 13.0000
3453 Alcatel Android 12.8300 yes no 5.0000 16.0000 2.0000 4000.0000 168.0000 2020 176 4.1321 4.2800 13.0000

3111 rows × 15 columns

In [79]:
df2.shape[0]-df3.shape[0]
Out[79]:
343

NOTES:

  • Outliers will be removed using Z-Score Method, considering outliers all data points with z-score greater than 3 or less than -3.
  • A total of 343 rows removed from dataframe as considered outliers.
In [80]:
for column in df3.select_dtypes(include=np.number).columns:
    df3[column]=np.clip(df3[column], df3[column].quantile(0.25)-1.5*(df3[column].quantile(0.75)-df3[column].quantile(0.25)), df3[column].quantile(0.75)+1.5*(df2[column].quantile(0.75)-df2[column].quantile(0.25)))
In [81]:
for column in df3.select_dtypes(include=np.number).columns:
    outliers=len(df3[(df3[column] < df3[column].quantile(0.25)-1.5*(df3[column].quantile(0.75)-df3[column].quantile(0.25))) | (df3[column] > df3[column].quantile(0.75)+1.5*(df2[column].quantile(0.75)-df2[column].quantile(0.25)))][column])
    print(f'{column}: {outliers} outliers')
screen_size: 0 outliers
selfie_camera_mp: 0 outliers
int_memory: 0 outliers
ram: 0 outliers
battery: 0 outliers
weight: 0 outliers
release_year: 0 outliers
days_used: 0 outliers
normalized_used_price: 0 outliers
normalized_new_price: 0 outliers
main_camera_mp_imp: 0 outliers

NOTES:

  • The clip function of NumPy is aplied to set all the values smaller than lower_whisker will be assigned the value of lower_whisker, and all the values greater than upper_whisker will be assigned the value of upper_whisker.
In [82]:
df3['release_year'].value_counts()
Out[82]:
2014    594
2015    494
2013    489
2016    371
2019    354
2018    303
2017    288
2020    218
Name: release_year, dtype: int64
In [83]:
df3['years_old']=2024-df3['release_year']
In [84]:
df3.drop('release_year', axis=1, inplace=True)
In [85]:
df3.head()
Out[85]:
brand_name os screen_size 4g 5g selfie_camera_mp int_memory ram battery weight days_used normalized_used_price normalized_new_price main_camera_mp_imp years_old
0 Honor Android 14.5000 yes no 5.0000 64.0000 4.0000 3020.0000 146.0000 127 4.3076 4.7151 13.0000 4
1 Honor Android 17.3000 yes yes 16.0000 128.0000 4.0000 4300.0000 213.0000 325 5.1621 5.5190 13.0000 4
2 Honor Android 16.6900 yes yes 8.0000 128.0000 4.0000 4200.0000 213.0000 162 5.1111 5.8846 13.0000 4
4 Honor Android 15.3200 yes no 8.0000 64.0000 4.0000 5000.0000 185.0000 293 4.3900 4.9478 13.0000 4
5 Honor Android 16.2300 yes no 8.0000 64.0000 4.0000 4000.0000 176.0000 223 4.4139 5.0607 13.0000 4
In [86]:
df3['tech_4G5G']=((df3['4g']=="yes")&(df3['5g']=="yes")).astype(int)
In [87]:
df3['tech_4G']=((df3['4g']=="yes")&(df3['5g']=="no")).astype(int)
In [88]:
df3['tech_2G3G']=((df3['4g']=="no")&(df3['5g']=="no")).astype(int)
In [89]:
df3.drop('4g', axis=1, inplace=True)
In [90]:
df3.drop('5g', axis=1, inplace=True)
In [91]:
df3.head()
Out[91]:
brand_name os screen_size selfie_camera_mp int_memory ram battery weight days_used normalized_used_price normalized_new_price main_camera_mp_imp years_old tech_4G5G tech_4G tech_2G3G
0 Honor Android 14.5000 5.0000 64.0000 4.0000 3020.0000 146.0000 127 4.3076 4.7151 13.0000 4 0 1 0
1 Honor Android 17.3000 16.0000 128.0000 4.0000 4300.0000 213.0000 325 5.1621 5.5190 13.0000 4 1 0 0
2 Honor Android 16.6900 8.0000 128.0000 4.0000 4200.0000 213.0000 162 5.1111 5.8846 13.0000 4 1 0 0
4 Honor Android 15.3200 8.0000 64.0000 4.0000 5000.0000 185.0000 293 4.3900 4.9478 13.0000 4 0 1 0
5 Honor Android 16.2300 8.0000 64.0000 4.0000 4000.0000 176.0000 223 4.4139 5.0607 13.0000 4 0 1 0

NOTES:

  • Created variable years_old to faster qualification by age
  • Created technology variable to clasify devices by technology or combination of technologies among: 4G, 4G5G, 2G3G. Is asumed devices not 4G neither 5G are 2G and/or 3G and labeled 2G3G. 5G only is descarted as already observed there are no devices in this category.
  • Technology variables created as dummy variabl.s
In [92]:
df3 = pd.get_dummies(df3, columns=['brand_name','os'], drop_first=True)
df3
Out[92]:
screen_size selfie_camera_mp int_memory ram battery weight days_used normalized_used_price normalized_new_price main_camera_mp_imp ... brand_name_Samsung brand_name_Sony brand_name_Spice brand_name_Vivo brand_name_XOLO brand_name_Xiaomi brand_name_ZTE os_Others os_Windows os_iOS
0 14.5000 5.0000 64.0000 4.0000 3020.0000 146.0000 127 4.3076 4.7151 13.0000 ... 0 0 0 0 0 0 0 0 0 0
1 17.3000 16.0000 128.0000 4.0000 4300.0000 213.0000 325 5.1621 5.5190 13.0000 ... 0 0 0 0 0 0 0 0 0 0
2 16.6900 8.0000 128.0000 4.0000 4200.0000 213.0000 162 5.1111 5.8846 13.0000 ... 0 0 0 0 0 0 0 0 0 0
4 15.3200 8.0000 64.0000 4.0000 5000.0000 185.0000 293 4.3900 4.9478 13.0000 ... 0 0 0 0 0 0 0 0 0 0
5 16.2300 8.0000 64.0000 4.0000 4000.0000 176.0000 223 4.4139 5.0607 13.0000 ... 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3449 15.3400 8.0000 64.0000 4.0000 5000.0000 190.0000 232 4.4923 6.4839 14.7778 ... 0 0 0 0 0 0 0 0 0 0
3450 15.2400 8.0000 128.0000 4.0000 4000.0000 200.0000 541 5.0377 6.2515 13.0000 ... 0 0 0 0 0 0 0 0 0 0
3451 15.8000 5.0000 32.0000 4.0000 4000.0000 165.0000 201 4.3573 4.5288 13.0000 ... 0 0 0 0 0 0 0 0 0 0
3452 15.8000 5.0000 32.0000 4.0000 4000.0000 160.0000 149 4.3498 4.6242 13.0000 ... 0 0 0 0 0 0 0 0 0 0
3453 12.8300 5.0000 16.0000 4.0000 4000.0000 168.0000 176 4.1321 4.2800 13.0000 ... 0 0 0 0 0 0 0 0 0 0

3111 rows × 50 columns

NOTES:

  • created dummy variable from 'brand_name'
  • created dummy variable from 'os'

Consolidated Notes from Data Preprocessing¶

Outlier detection and treatment

  • From visualization is was noticed there is a lot of outliers in many variables. Detailed analysis required.
  • Quantified outliers is noticed only two variables have no outliers, while the rest have up to 639 (18% of samples).
  • Outliers will be removed using Z-Score Method, considering outliers all data points with z-score greater than 3 or less than -3.
  • A total of 343 rows removed from dataframe as considered outliers.
  • The clip function of NumPy is aplied to set all the values smaller than lower_whisker will be assigned the value of lower_whisker, and all the values greater than upper_whisker will be assigned the value of upper_whisker.

Feature engineering

  • Created variable years_old to faster qualification by age
  • Created technology variable to clasify devices by technology or combination of technologies among: 4G, 4G5G, 2G3G. Is asumed devices not 4G neither 5G are 2G and/or 3G and labeled 2G3G. 5G only is descarted as already observed there are no devices in this category.
  • Technology variables created as dummy variables

Preparing data for modeling

  • created dummy variable from 'brand_name'
  • created dummy variable from 'os'

EDA¶

  • It is a good idea to explore the data once again after manipulating it.
In [93]:
df3.shape
Out[93]:
(3111, 50)
In [94]:
df3.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3111 entries, 0 to 3453
Data columns (total 50 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   screen_size            3111 non-null   float64
 1   selfie_camera_mp       3111 non-null   float64
 2   int_memory             3111 non-null   float64
 3   ram                    3111 non-null   float64
 4   battery                3111 non-null   float64
 5   weight                 3111 non-null   float64
 6   days_used              3111 non-null   int64  
 7   normalized_used_price  3111 non-null   float64
 8   normalized_new_price   3111 non-null   float64
 9   main_camera_mp_imp     3111 non-null   float64
 10  years_old              3111 non-null   int64  
 11  tech_4G5G              3111 non-null   int32  
 12  tech_4G                3111 non-null   int32  
 13  tech_2G3G              3111 non-null   int32  
 14  brand_name_Alcatel     3111 non-null   uint8  
 15  brand_name_Apple       3111 non-null   uint8  
 16  brand_name_Asus        3111 non-null   uint8  
 17  brand_name_BlackBerry  3111 non-null   uint8  
 18  brand_name_Celkon      3111 non-null   uint8  
 19  brand_name_Coolpad     3111 non-null   uint8  
 20  brand_name_Gionee      3111 non-null   uint8  
 21  brand_name_Google      3111 non-null   uint8  
 22  brand_name_HTC         3111 non-null   uint8  
 23  brand_name_Honor       3111 non-null   uint8  
 24  brand_name_Huawei      3111 non-null   uint8  
 25  brand_name_Infinix     3111 non-null   uint8  
 26  brand_name_Karbonn     3111 non-null   uint8  
 27  brand_name_LG          3111 non-null   uint8  
 28  brand_name_Lava        3111 non-null   uint8  
 29  brand_name_Lenovo      3111 non-null   uint8  
 30  brand_name_Meizu       3111 non-null   uint8  
 31  brand_name_Micromax    3111 non-null   uint8  
 32  brand_name_Microsoft   3111 non-null   uint8  
 33  brand_name_Motorola    3111 non-null   uint8  
 34  brand_name_Nokia       3111 non-null   uint8  
 35  brand_name_OnePlus     3111 non-null   uint8  
 36  brand_name_Oppo        3111 non-null   uint8  
 37  brand_name_Others      3111 non-null   uint8  
 38  brand_name_Panasonic   3111 non-null   uint8  
 39  brand_name_Realme      3111 non-null   uint8  
 40  brand_name_Samsung     3111 non-null   uint8  
 41  brand_name_Sony        3111 non-null   uint8  
 42  brand_name_Spice       3111 non-null   uint8  
 43  brand_name_Vivo        3111 non-null   uint8  
 44  brand_name_XOLO        3111 non-null   uint8  
 45  brand_name_Xiaomi      3111 non-null   uint8  
 46  brand_name_ZTE         3111 non-null   uint8  
 47  os_Others              3111 non-null   uint8  
 48  os_Windows             3111 non-null   uint8  
 49  os_iOS                 3111 non-null   uint8  
dtypes: float64(9), int32(3), int64(2), uint8(36)
memory usage: 437.5 KB
In [95]:
df3.describe().T
Out[95]:
count mean std min 25% 50% 75% max
screen_size 3111.0000 13.3019 2.5214 8.8150 12.7000 12.8300 15.2900 19.2500
selfie_camera_mp 3111.0000 5.8060 5.1000 0.0000 2.0000 5.0000 8.0000 17.0000
int_memory 3111.0000 40.6709 35.1027 0.0100 16.0000 32.0000 64.0000 136.0000
ram 3111.0000 4.0000 0.0000 4.0000 4.0000 4.0000 4.0000 4.0000
battery 3111.0000 2974.3420 1013.9041 500.0000 2100.0000 3000.0000 3800.0000 6650.0000
weight 3111.0000 163.1934 34.5938 82.5000 141.0000 158.0000 180.0000 244.5000
days_used 3111.0000 686.0501 241.9998 91.0000 554.0000 700.0000 873.5000 1094.0000
normalized_used_price 3111.0000 4.3423 0.5006 3.0462 4.0289 4.3664 4.6841 5.7667
normalized_new_price 3111.0000 5.2147 0.6140 3.5242 4.7892 5.1961 5.6325 6.9575
main_camera_mp_imp 3111.0000 9.8252 4.5375 0.3000 5.0000 12.0000 13.0000 24.0000
years_old 3111.0000 8.1032 2.2261 4.0000 6.0000 9.0000 10.0000 11.0000
tech_4G5G 3111.0000 0.0302 0.1712 0.0000 0.0000 0.0000 0.0000 1.0000
tech_4G 3111.0000 0.6541 0.4757 0.0000 0.0000 1.0000 1.0000 1.0000
tech_2G3G 3111.0000 0.3157 0.4649 0.0000 0.0000 0.0000 1.0000 1.0000
brand_name_Alcatel 3111.0000 0.0366 0.1879 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Apple 3111.0000 0.0077 0.0875 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Asus 3111.0000 0.0357 0.1855 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_BlackBerry 3111.0000 0.0068 0.0819 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Celkon 3111.0000 0.0064 0.0799 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Coolpad 3111.0000 0.0071 0.0838 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Gionee 3111.0000 0.0177 0.1318 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Google 3111.0000 0.0045 0.0669 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_HTC 3111.0000 0.0354 0.1847 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Honor 3111.0000 0.0321 0.1764 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Huawei 3111.0000 0.0659 0.2481 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Infinix 3111.0000 0.0032 0.0566 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Karbonn 3111.0000 0.0087 0.0928 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_LG 3111.0000 0.0614 0.2401 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Lava 3111.0000 0.0100 0.0993 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Lenovo 3111.0000 0.0473 0.2122 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Meizu 3111.0000 0.0199 0.1398 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Micromax 3111.0000 0.0328 0.1781 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Microsoft 3111.0000 0.0068 0.0819 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Motorola 3111.0000 0.0328 0.1781 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Nokia 3111.0000 0.0318 0.1756 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_OnePlus 3111.0000 0.0064 0.0799 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Oppo 3111.0000 0.0341 0.1814 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Others 3111.0000 0.1472 0.3544 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Panasonic 3111.0000 0.0151 0.1220 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Realme 3111.0000 0.0132 0.1141 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Samsung 3111.0000 0.0923 0.2894 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Sony 3111.0000 0.0257 0.1583 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Spice 3111.0000 0.0084 0.0911 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Vivo 3111.0000 0.0341 0.1814 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_XOLO 3111.0000 0.0154 0.1233 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_Xiaomi 3111.0000 0.0392 0.1941 0.0000 0.0000 0.0000 0.0000 1.0000
brand_name_ZTE 3111.0000 0.0440 0.2052 0.0000 0.0000 0.0000 0.0000 1.0000
os_Others 3111.0000 0.0305 0.1721 0.0000 0.0000 0.0000 0.0000 1.0000
os_Windows 3111.0000 0.0206 0.1420 0.0000 0.0000 0.0000 0.0000 1.0000
os_iOS 3111.0000 0.0077 0.0875 0.0000 0.0000 0.0000 0.0000 1.0000

Model Building - Linear Regression¶

In [96]:
#defining X and y variables
X = df3.drop(["normalized_used_price"], axis=1)
y = df3["normalized_used_price"]
In [97]:
X
Out[97]:
screen_size selfie_camera_mp int_memory ram battery weight days_used normalized_new_price main_camera_mp_imp years_old ... brand_name_Samsung brand_name_Sony brand_name_Spice brand_name_Vivo brand_name_XOLO brand_name_Xiaomi brand_name_ZTE os_Others os_Windows os_iOS
0 14.5000 5.0000 64.0000 4.0000 3020.0000 146.0000 127 4.7151 13.0000 4 ... 0 0 0 0 0 0 0 0 0 0
1 17.3000 16.0000 128.0000 4.0000 4300.0000 213.0000 325 5.5190 13.0000 4 ... 0 0 0 0 0 0 0 0 0 0
2 16.6900 8.0000 128.0000 4.0000 4200.0000 213.0000 162 5.8846 13.0000 4 ... 0 0 0 0 0 0 0 0 0 0
4 15.3200 8.0000 64.0000 4.0000 5000.0000 185.0000 293 4.9478 13.0000 4 ... 0 0 0 0 0 0 0 0 0 0
5 16.2300 8.0000 64.0000 4.0000 4000.0000 176.0000 223 5.0607 13.0000 4 ... 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3449 15.3400 8.0000 64.0000 4.0000 5000.0000 190.0000 232 6.4839 14.7778 5 ... 0 0 0 0 0 0 0 0 0 0
3450 15.2400 8.0000 128.0000 4.0000 4000.0000 200.0000 541 6.2515 13.0000 6 ... 0 0 0 0 0 0 0 0 0 0
3451 15.8000 5.0000 32.0000 4.0000 4000.0000 165.0000 201 4.5288 13.0000 4 ... 0 0 0 0 0 0 0 0 0 0
3452 15.8000 5.0000 32.0000 4.0000 4000.0000 160.0000 149 4.6242 13.0000 4 ... 0 0 0 0 0 0 0 0 0 0
3453 12.8300 5.0000 16.0000 4.0000 4000.0000 168.0000 176 4.2800 13.0000 4 ... 0 0 0 0 0 0 0 0 0 0

3111 rows × 49 columns

In [98]:
y
Out[98]:
0      4.3076
1      5.1621
2      5.1111
4      4.3900
5      4.4139
        ...  
3449   4.4923
3450   5.0377
3451   4.3573
3452   4.3498
3453   4.1321
Name: normalized_used_price, Length: 3111, dtype: float64
In [99]:
#add the intercept to data
X = sm.add_constant(X,has_constant='add')
X
Out[99]:
const screen_size selfie_camera_mp int_memory ram battery weight days_used normalized_new_price main_camera_mp_imp ... brand_name_Samsung brand_name_Sony brand_name_Spice brand_name_Vivo brand_name_XOLO brand_name_Xiaomi brand_name_ZTE os_Others os_Windows os_iOS
0 1.0000 14.5000 5.0000 64.0000 4.0000 3020.0000 146.0000 127 4.7151 13.0000 ... 0 0 0 0 0 0 0 0 0 0
1 1.0000 17.3000 16.0000 128.0000 4.0000 4300.0000 213.0000 325 5.5190 13.0000 ... 0 0 0 0 0 0 0 0 0 0
2 1.0000 16.6900 8.0000 128.0000 4.0000 4200.0000 213.0000 162 5.8846 13.0000 ... 0 0 0 0 0 0 0 0 0 0
4 1.0000 15.3200 8.0000 64.0000 4.0000 5000.0000 185.0000 293 4.9478 13.0000 ... 0 0 0 0 0 0 0 0 0 0
5 1.0000 16.2300 8.0000 64.0000 4.0000 4000.0000 176.0000 223 5.0607 13.0000 ... 0 0 0 0 0 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3449 1.0000 15.3400 8.0000 64.0000 4.0000 5000.0000 190.0000 232 6.4839 14.7778 ... 0 0 0 0 0 0 0 0 0 0
3450 1.0000 15.2400 8.0000 128.0000 4.0000 4000.0000 200.0000 541 6.2515 13.0000 ... 0 0 0 0 0 0 0 0 0 0
3451 1.0000 15.8000 5.0000 32.0000 4.0000 4000.0000 165.0000 201 4.5288 13.0000 ... 0 0 0 0 0 0 0 0 0 0
3452 1.0000 15.8000 5.0000 32.0000 4.0000 4000.0000 160.0000 149 4.6242 13.0000 ... 0 0 0 0 0 0 0 0 0 0
3453 1.0000 12.8300 5.0000 16.0000 4.0000 4000.0000 168.0000 176 4.2800 13.0000 ... 0 0 0 0 0 0 0 0 0 0

3111 rows × 50 columns

In [100]:
#splitting the data in 70:30 ratio for train to test data
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=1)
x_train.shape[0]+x_test.shape[0]-X.shape[0]
Out[100]:
0
In [101]:
x_train.shape
Out[101]:
(2177, 50)
In [102]:
y_train.shape
Out[102]:
(2177,)
In [103]:
olsmodel = sm.OLS(y_train, x_train).fit()
print(olsmodel.summary())
                              OLS Regression Results                             
=================================================================================
Dep. Variable:     normalized_used_price   R-squared:                       0.821
Model:                               OLS   Adj. R-squared:                  0.818
Method:                    Least Squares   F-statistic:                     212.9
Date:                   Sat, 08 Jun 2024   Prob (F-statistic):               0.00
Time:                           04:51:53   Log-Likelihood:                 265.91
No. Observations:                   2177   AIC:                            -437.8
Df Residuals:                       2130   BIC:                            -170.6
Df Model:                             46                                         
Covariance Type:               nonrobust                                         
=========================================================================================
                            coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------------
const                     0.0874      0.004     20.161      0.000       0.079       0.096
screen_size               0.0305      0.004      7.867      0.000       0.023       0.038
selfie_camera_mp          0.0164      0.002      9.424      0.000       0.013       0.020
int_memory                0.0017      0.000      8.290      0.000       0.001       0.002
ram                       0.3497      0.017     20.161      0.000       0.316       0.384
battery               -4.003e-06   9.39e-06     -0.426      0.670   -2.24e-05    1.44e-05
weight                    0.0019      0.000      7.128      0.000       0.001       0.002
days_used              4.218e-05   2.94e-05      1.433      0.152   -1.56e-05    9.99e-05
normalized_new_price      0.3298      0.013     25.162      0.000       0.304       0.356
main_camera_mp_imp        0.0254      0.002     15.652      0.000       0.022       0.029
years_old                 0.0019      0.005      0.398      0.690      -0.008       0.011
tech_4G5G                 0.0363      0.022      1.620      0.105      -0.008       0.080
tech_4G                   0.0574      0.011      5.116      0.000       0.035       0.079
tech_2G3G                -0.0062      0.015     -0.408      0.683      -0.036       0.024
brand_name_Alcatel       -0.0726      0.044     -1.638      0.101      -0.159       0.014
brand_name_Apple          0.0171      0.034      0.505      0.614      -0.049       0.084
brand_name_Asus          -0.0300      0.045     -0.662      0.508      -0.119       0.059
brand_name_BlackBerry    -0.1544      0.069     -2.254      0.024      -0.289      -0.020
brand_name_Celkon        -0.0035      0.072     -0.048      0.961      -0.145       0.138
brand_name_Coolpad       -0.0679      0.065     -1.042      0.297      -0.196       0.060
brand_name_Gionee        -0.0508      0.052     -0.972      0.331      -0.153       0.052
brand_name_Google         0.0111      0.097      0.115      0.909      -0.179       0.202
brand_name_HTC           -0.0390      0.045     -0.859      0.391      -0.128       0.050
brand_name_Honor         -0.0532      0.047     -1.129      0.259      -0.146       0.039
brand_name_Huawei        -0.0669      0.042     -1.593      0.111      -0.149       0.015
brand_name_Infinix       -0.0179      0.098     -0.184      0.854      -0.209       0.173
brand_name_Karbonn       -0.0440      0.060     -0.734      0.463      -0.161       0.074
brand_name_LG            -0.0631      0.042     -1.500      0.134      -0.146       0.019
brand_name_Lava           0.0273      0.060      0.458      0.647      -0.090       0.144
brand_name_Lenovo        -0.0478      0.043     -1.104      0.270      -0.133       0.037
brand_name_Meizu         -0.0738      0.052     -1.424      0.155      -0.175       0.028
brand_name_Micromax      -0.0474      0.045     -1.041      0.298      -0.137       0.042
brand_name_Microsoft      0.0233      0.078      0.301      0.763      -0.129       0.175
brand_name_Motorola      -0.1026      0.046     -2.217      0.027      -0.193      -0.012
brand_name_Nokia         -0.0124      0.047     -0.264      0.792      -0.105       0.080
brand_name_OnePlus       -0.0852      0.066     -1.283      0.200      -0.216       0.045
brand_name_Oppo          -0.0086      0.046     -0.186      0.852      -0.100       0.082
brand_name_Others        -0.0614      0.039     -1.555      0.120      -0.139       0.016
brand_name_Panasonic     -0.0750      0.052     -1.436      0.151      -0.177       0.027
brand_name_Realme        -0.0601      0.060     -1.004      0.316      -0.177       0.057
brand_name_Samsung       -0.0521      0.041     -1.284      0.199      -0.132       0.027
brand_name_Sony          -0.1152      0.048     -2.384      0.017      -0.210      -0.020
brand_name_Spice         -0.0995      0.061     -1.634      0.102      -0.219       0.020
brand_name_Vivo          -0.0974      0.045     -2.144      0.032      -0.186      -0.008
brand_name_XOLO          -0.1044      0.052     -1.988      0.047      -0.207      -0.001
brand_name_Xiaomi        -0.0106      0.045     -0.235      0.814      -0.099       0.078
brand_name_ZTE           -0.0637      0.044     -1.448      0.148      -0.150       0.023
os_Others                 0.0190      0.032      0.592      0.554      -0.044       0.082
os_Windows               -0.0241      0.041     -0.592      0.554      -0.104       0.056
os_iOS                    0.0171      0.034      0.505      0.614      -0.049       0.084
==============================================================================
Omnibus:                       98.734   Durbin-Watson:                   1.975
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              119.131
Skew:                          -0.485   Prob(JB):                     1.35e-26
Kurtosis:                       3.609   Cond. No.                     7.48e+17
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 3.95e-26. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.

Model Performance Check¶

In [104]:
variable_names = x_train.columns
coefficients = olsmodel.params
p_values = olsmodel.pvalues
standard_errors = olsmodel.bse
t_values = olsmodel.tvalues
results_dict = {'Variable': variable_names, 'Coefficient': coefficients, 'Standard Error': standard_errors, 'T-value': t_values, 'P-value': p_values}
results_df = pd.DataFrame(results_dict)
results_df
Out[104]:
Variable Coefficient Standard Error T-value P-value
const const 0.0874 0.0043 20.1614 0.0000
screen_size screen_size 0.0305 0.0039 7.8666 0.0000
selfie_camera_mp selfie_camera_mp 0.0164 0.0017 9.4240 0.0000
int_memory int_memory 0.0017 0.0002 8.2899 0.0000
ram ram 0.3497 0.0173 20.1614 0.0000
battery battery -0.0000 0.0000 -0.4264 0.6699
weight weight 0.0019 0.0003 7.1279 0.0000
days_used days_used 0.0000 0.0000 1.4329 0.1520
normalized_new_price normalized_new_price 0.3298 0.0131 25.1625 0.0000
main_camera_mp_imp main_camera_mp_imp 0.0254 0.0016 15.6523 0.0000
years_old years_old 0.0019 0.0048 0.3984 0.6904
tech_4G5G tech_4G5G 0.0363 0.0224 1.6199 0.1054
tech_4G tech_4G 0.0574 0.0112 5.1164 0.0000
tech_2G3G tech_2G3G -0.0062 0.0152 -0.4080 0.6833
brand_name_Alcatel brand_name_Alcatel -0.0726 0.0443 -1.6384 0.1015
brand_name_Apple brand_name_Apple 0.0171 0.0339 0.5048 0.6138
brand_name_Asus brand_name_Asus -0.0300 0.0452 -0.6625 0.5077
brand_name_BlackBerry brand_name_BlackBerry -0.1544 0.0685 -2.2543 0.0243
brand_name_Celkon brand_name_Celkon -0.0035 0.0721 -0.0484 0.9614
brand_name_Coolpad brand_name_Coolpad -0.0679 0.0651 -1.0425 0.2973
brand_name_Gionee brand_name_Gionee -0.0508 0.0523 -0.9715 0.3314
brand_name_Google brand_name_Google 0.0111 0.0972 0.1145 0.9088
brand_name_HTC brand_name_HTC -0.0390 0.0454 -0.8585 0.3907
brand_name_Honor brand_name_Honor -0.0532 0.0472 -1.1292 0.2590
brand_name_Huawei brand_name_Huawei -0.0669 0.0420 -1.5932 0.1113
brand_name_Infinix brand_name_Infinix -0.0179 0.0975 -0.1839 0.8541
brand_name_Karbonn brand_name_Karbonn -0.0440 0.0599 -0.7336 0.4633
brand_name_LG brand_name_LG -0.0631 0.0421 -1.5004 0.1337
brand_name_Lava brand_name_Lava 0.0273 0.0596 0.4578 0.6472
brand_name_Lenovo brand_name_Lenovo -0.0478 0.0433 -1.1042 0.2696
brand_name_Meizu brand_name_Meizu -0.0738 0.0518 -1.4239 0.1546
brand_name_Micromax brand_name_Micromax -0.0474 0.0455 -1.0414 0.2978
brand_name_Microsoft brand_name_Microsoft 0.0233 0.0775 0.3009 0.7635
brand_name_Motorola brand_name_Motorola -0.1026 0.0463 -2.2166 0.0268
brand_name_Nokia brand_name_Nokia -0.0124 0.0471 -0.2640 0.7918
brand_name_OnePlus brand_name_OnePlus -0.0852 0.0664 -1.2828 0.1997
brand_name_Oppo brand_name_Oppo -0.0086 0.0464 -0.1864 0.8521
brand_name_Others brand_name_Others -0.0614 0.0395 -1.5547 0.1202
brand_name_Panasonic brand_name_Panasonic -0.0750 0.0522 -1.4360 0.1512
brand_name_Realme brand_name_Realme -0.0601 0.0598 -1.0035 0.3157
brand_name_Samsung brand_name_Samsung -0.0521 0.0405 -1.2840 0.1993
brand_name_Sony brand_name_Sony -0.1152 0.0483 -2.3840 0.0172
brand_name_Spice brand_name_Spice -0.0995 0.0609 -1.6337 0.1025
brand_name_Vivo brand_name_Vivo -0.0974 0.0454 -2.1445 0.0321
brand_name_XOLO brand_name_XOLO -0.1044 0.0525 -1.9881 0.0469
brand_name_Xiaomi brand_name_Xiaomi -0.0106 0.0453 -0.2347 0.8145
brand_name_ZTE brand_name_ZTE -0.0637 0.0440 -1.4477 0.1478
os_Others os_Others 0.0190 0.0321 0.5915 0.5542
os_Windows os_Windows -0.0241 0.0408 -0.5915 0.5542
os_iOS os_iOS 0.0171 0.0339 0.5048 0.6138
In [105]:
# DataFrame to store the metrics
compare_df = pd.DataFrame(columns=['Model', 'RMSE_Train', 'MAE_Train', 'R2_Train', 'Adj_R2_Train', 'MAPE_Train','RMSE_Test', 'MAE_Test', 'R2_Test', 'Adj_R2_Test', 'MAPE_Test'])

# Define a function to calculate MAPE
def mean_absolute_percentage_error(y_true, y_pred):
    return np.mean(np.abs((y_true - y_pred) / y_true)) * 100

# Make predictions
y_train_pred = olsmodel.predict(x_train)
y_test_pred = olsmodel.predict(x_test)

# Calculate metrics for training data
rmse_train = np.sqrt(mean_squared_error(y_train, y_train_pred))
mae_train = mean_absolute_error(y_train, y_train_pred)
r2_train = r2_score(y_train, y_train_pred)
adj_r2_train = 1 - (1-r2_train)*(len(y_train)-1)/(len(y_train)-x_train.shape[1]-1)
mape_train = mean_absolute_percentage_error(y_train, y_train_pred)

# Calculate metrics for testing data
rmse_test = np.sqrt(mean_squared_error(y_test, y_test_pred))
mae_test = mean_absolute_error(y_test, y_test_pred)
r2_test = r2_score(y_test, y_test_pred)
adj_r2_test = 1 - (1-r2_test)*(len(y_test)-1)/(len(y_test)-x_test.shape[1]-1)
mape_test = mean_absolute_percentage_error(y_test, y_test_pred)

# Append the results to the DataFrame
compare_df = compare_df.append({
        'Model': "olsmodel",
        'RMSE_Train': rmse_train,
        'MAE_Train': mae_train,
        'R2_Train': r2_train,
        'Adj_R2_Train': adj_r2_train,
        'MAPE_Train': mape_train,
        'RMSE_Test': rmse_test,
        'MAE_Test': mae_test,
        'R2_Test': r2_test,
        'Adj_R2_Test': adj_r2_test,
        'MAPE_Test': mape_test
    }, ignore_index=True)

compare_df
Out[105]:
Model RMSE_Train MAE_Train R2_Train Adj_R2_Train MAPE_Train RMSE_Test MAE_Test R2_Test Adj_R2_Test MAPE_Test
0 olsmodel 0.2141 0.1704 0.8214 0.8172 4.0070 0.2140 0.1695 0.8061 0.7951 4.0127
In [106]:
#check the RMSE on the train and test data
rmse_train = np.sqrt(mean_squared_error(y_train, olsmodel.predict(x_train)))
rmse_test = np.sqrt(mean_squared_error(y_test, olsmodel.predict(x_test)))
print(f'RMSE on the train data is {rmse_train} \nRMSE on the test data is {rmse_test}')
RMSE on the train data is 0.21414881459330756 
RMSE on the test data is 0.21396552992142798
In [107]:
#check the MAE on the train and test data
mae_train = mean_absolute_error(y_train, olsmodel.predict(x_train))
mae_test = mean_absolute_error(y_test, olsmodel.predict(x_test))
print(f'MAE on the train data is {mae_train} \nMAE on the test data is {mae_test}')
MAE on the train data is 0.17038724209562398 
MAE on the test data is 0.1695200917440956
In [108]:
#check the R2 on the train and test data
R2_train = r2_score(y_train, olsmodel.predict(x_train))
R2_test = r2_score(y_test, olsmodel.predict(x_test))
print(f'R2 on the train data is {R2_train} \nR2 on the test data is {R2_test}')
R2 on the train data is 0.8213657734739421 
R2 on the test data is 0.8060737743734712
In [109]:
#check the MAPE on the train and test data
MAPE_train = mean_absolute_percentage_error(y_train, olsmodel.predict(x_train))
MAPE_test = mean_absolute_percentage_error(y_test, olsmodel.predict(x_test))
print(f'MAPE on the train data is {MAPE_train} \nMAPE on the test data is {MAPE_test}')
MAPE on the train data is 4.00697435659076 
MAPE on the test data is 4.012732729785307

Notes from Model Performance Check¶

  • The training $R^2$ is 0.82, so the model is not underfitting
  • The train and test RMSE and MAE are comparable, so the model is not overfitting either
  • MAE suggests that the model can predict used devices prices within a mean error of 0.17 on the test data
  • MAPE of 4.01 on the test data means that we are able to predict within 4.01% of the normalized_used_price

Checking Linear Regression Assumptions¶

  • In order to make statistical inferences from a linear regression model, it is important to ensure that the assumptions of linear regression are satisfied.

Check for No Multicollinearity¶

In [110]:
pd.Series([variance_inflation_factor(x_train.values, i) for i in range(x_train.shape[1])],index=x_train.columns)
Out[110]:
const                   0.0000
screen_size             4.4177
selfie_camera_mp        3.5707
int_memory              2.5605
ram                     0.0000
battery                 4.1794
weight                  3.8233
days_used               2.3341
normalized_new_price    3.1119
main_camera_mp_imp      2.5496
years_old               5.3134
tech_4G5G                  inf
tech_4G                    inf
tech_2G3G                  inf
brand_name_Alcatel      3.3443
brand_name_Apple           inf
brand_name_Asus         3.1173
brand_name_BlackBerry   1.5900
brand_name_Celkon       1.4330
brand_name_Coolpad      1.5249
brand_name_Gionee       2.0661
brand_name_Google       1.2061
brand_name_HTC          3.1428
brand_name_Honor        2.9471
brand_name_Huawei       5.1173
brand_name_Infinix      1.2140
brand_name_Karbonn      1.6686
brand_name_LG           4.8793
brand_name_Lava         1.6491
brand_name_Lenovo       3.8156
brand_name_Meizu        2.1960
brand_name_Micromax     3.0313
brand_name_Microsoft    1.9091
brand_name_Motorola     3.1369
brand_name_Nokia        3.5663
brand_name_OnePlus      1.5889
brand_name_Oppo         3.0653
brand_name_Others       8.6012
brand_name_Panasonic    2.0614
brand_name_Realme       1.8132
brand_name_Samsung      6.7088
brand_name_Sony         2.7663
brand_name_Spice        1.6461
brand_name_Vivo         3.5532
brand_name_XOLO         2.0250
brand_name_Xiaomi       3.6126
brand_name_ZTE          3.7526
os_Others               1.5121
os_Windows              1.6691
os_iOS                     inf
dtype: float64
In [111]:
x_train1 = x_train.drop(["years_old"], axis=1)
olsmodel1 = sm.OLS(y_train, x_train1).fit()
print(olsmodel1.summary())
                              OLS Regression Results                             
=================================================================================
Dep. Variable:     normalized_used_price   R-squared:                       0.821
Model:                               OLS   Adj. R-squared:                  0.818
Method:                    Least Squares   F-statistic:                     217.7
Date:                   Sat, 08 Jun 2024   Prob (F-statistic):               0.00
Time:                           04:51:54   Log-Likelihood:                 265.83
No. Observations:                   2177   AIC:                            -439.7
Df Residuals:                       2131   BIC:                            -178.1
Df Model:                             45                                         
Covariance Type:               nonrobust                                         
=========================================================================================
                            coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------------
const                     0.0878      0.004     20.842      0.000       0.080       0.096
screen_size               0.0303      0.004      7.865      0.000       0.023       0.038
selfie_camera_mp          0.0161      0.002      9.984      0.000       0.013       0.019
int_memory                0.0017      0.000      8.445      0.000       0.001       0.002
ram                       0.3513      0.017     20.842      0.000       0.318       0.384
battery               -4.697e-06   9.22e-06     -0.509      0.611   -2.28e-05    1.34e-05
weight                    0.0019      0.000      7.167      0.000       0.001       0.002
days_used              4.757e-05   2.61e-05      1.819      0.069   -3.71e-06    9.88e-05
normalized_new_price      0.3319      0.012     27.581      0.000       0.308       0.356
main_camera_mp_imp        0.0253      0.002     15.678      0.000       0.022       0.028
tech_4G5G                 0.0348      0.022      1.575      0.115      -0.009       0.078
tech_4G                   0.0571      0.011      5.104      0.000       0.035       0.079
tech_2G3G                -0.0041      0.014     -0.286      0.775      -0.032       0.024
brand_name_Alcatel       -0.0733      0.044     -1.657      0.098      -0.160       0.013
brand_name_Apple          0.0157      0.034      0.465      0.642      -0.050       0.082
brand_name_Asus          -0.0306      0.045     -0.676      0.499      -0.119       0.058
brand_name_BlackBerry    -0.1543      0.068     -2.253      0.024      -0.289      -0.020
brand_name_Celkon        -0.0025      0.072     -0.035      0.972      -0.144       0.139
brand_name_Coolpad       -0.0687      0.065     -1.056      0.291      -0.196       0.059
brand_name_Gionee        -0.0511      0.052     -0.978      0.328      -0.154       0.051
brand_name_Google         0.0073      0.097      0.075      0.940      -0.182       0.197
brand_name_HTC           -0.0393      0.045     -0.865      0.387      -0.128       0.050
brand_name_Honor         -0.0539      0.047     -1.145      0.252      -0.146       0.038
brand_name_Huawei        -0.0673      0.042     -1.604      0.109      -0.150       0.015
brand_name_Infinix       -0.0186      0.097     -0.191      0.849      -0.210       0.173
brand_name_Karbonn       -0.0430      0.060     -0.718      0.473      -0.160       0.074
brand_name_LG            -0.0640      0.042     -1.523      0.128      -0.146       0.018
brand_name_Lava           0.0265      0.060      0.446      0.656      -0.090       0.143
brand_name_Lenovo        -0.0479      0.043     -1.107      0.268      -0.133       0.037
brand_name_Meizu         -0.0744      0.052     -1.435      0.151      -0.176       0.027
brand_name_Micromax      -0.0476      0.045     -1.047      0.295      -0.137       0.042
brand_name_Microsoft      0.0213      0.077      0.276      0.783      -0.130       0.173
brand_name_Motorola      -0.1032      0.046     -2.233      0.026      -0.194      -0.013
brand_name_Nokia         -0.0147      0.047     -0.315      0.753      -0.106       0.077
brand_name_OnePlus       -0.0854      0.066     -1.286      0.199      -0.216       0.045
brand_name_Oppo          -0.0088      0.046     -0.189      0.850      -0.100       0.082
brand_name_Others        -0.0621      0.039     -1.575      0.115      -0.139       0.015
brand_name_Panasonic     -0.0761      0.052     -1.458      0.145      -0.178       0.026
brand_name_Realme        -0.0612      0.060     -1.024      0.306      -0.178       0.056
brand_name_Samsung       -0.0529      0.040     -1.306      0.192      -0.132       0.027
brand_name_Sony          -0.1157      0.048     -2.395      0.017      -0.210      -0.021
brand_name_Spice         -0.0987      0.061     -1.622      0.105      -0.218       0.021
brand_name_Vivo          -0.0977      0.045     -2.152      0.032      -0.187      -0.009
brand_name_XOLO          -0.1041      0.052     -1.983      0.048      -0.207      -0.001
brand_name_Xiaomi        -0.0106      0.045     -0.235      0.814      -0.099       0.078
brand_name_ZTE           -0.0640      0.044     -1.456      0.146      -0.150       0.022
os_Others                 0.0197      0.032      0.614      0.539      -0.043       0.083
os_Windows               -0.0228      0.041     -0.561      0.575      -0.103       0.057
os_iOS                    0.0157      0.034      0.465      0.642      -0.050       0.082
==============================================================================
Omnibus:                       99.237   Durbin-Watson:                   1.976
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              119.961
Skew:                          -0.486   Prob(JB):                     8.93e-27
Kurtosis:                       3.614   Cond. No.                     6.70e+17
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 4.92e-26. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
In [112]:
# Make predictions
x_test1 = x_test.drop(["years_old"], axis=1)
y_train_pred = olsmodel1.predict(x_train1)
y_test_pred = olsmodel1.predict(x_test1)

# Calculate metrics for training data
rmse_train = np.sqrt(mean_squared_error(y_train, y_train_pred))
mae_train = mean_absolute_error(y_train, y_train_pred)
r2_train = r2_score(y_train, y_train_pred)
adj_r2_train = 1 - (1-r2_train)*(len(y_train)-1)/(len(y_train)-x_train.shape[1]-1)
mape_train = mean_absolute_percentage_error(y_train, y_train_pred)

# Calculate metrics for testing data
rmse_test = np.sqrt(mean_squared_error(y_test, y_test_pred))
mae_test = mean_absolute_error(y_test, y_test_pred)
r2_test = r2_score(y_test, y_test_pred)
adj_r2_test = 1 - (1-r2_test)*(len(y_test)-1)/(len(y_test)-x_test.shape[1]-1)
mape_test = mean_absolute_percentage_error(y_test, y_test_pred)

# Append the results to the DataFrame
compare_df = compare_df.append({
        'Model': "olsmodel1",
        'RMSE_Train': rmse_train,
        'MAE_Train': mae_train,
        'R2_Train': r2_train,
        'Adj_R2_Train': adj_r2_train,
        'MAPE_Train': mape_train,
        'RMSE_Test': rmse_test,
        'MAE_Test': mae_test,
        'R2_Test': r2_test,
        'Adj_R2_Test': adj_r2_test,
        'MAPE_Test': mape_test
    }, ignore_index=True)

compare_df
Out[112]:
Model RMSE_Train MAE_Train R2_Train Adj_R2_Train MAPE_Train RMSE_Test MAE_Test R2_Test Adj_R2_Test MAPE_Test
0 olsmodel 0.2141 0.1704 0.8214 0.8172 4.0070 0.2140 0.1695 0.8061 0.7951 4.0127
1 olsmodel1 0.2142 0.1704 0.8214 0.8172 4.0071 0.2139 0.1694 0.8062 0.7952 4.0094
In [113]:
# Check for No Multicollinearity on olsmodel1
pd.Series([variance_inflation_factor(x_train1.values, i) for i in range(x_train1.shape[1])],index=x_train1.columns)
Out[113]:
const                   0.0000
screen_size             4.3797
selfie_camera_mp        3.0842
int_memory              2.4111
ram                     0.0000
battery                 4.0356
weight                  3.8093
days_used               1.8421
normalized_new_price    2.6237
main_camera_mp_imp      2.5315
tech_4G5G                  inf
tech_4G                    inf
tech_2G3G                  inf
brand_name_Alcatel      3.3386
brand_name_Apple           inf
brand_name_Asus         3.1137
brand_name_BlackBerry   1.5899
brand_name_Celkon       1.4314
brand_name_Coolpad      1.5233
brand_name_Gionee       2.0656
brand_name_Google       1.1943
brand_name_HTC          3.1420
brand_name_Honor        2.9431
brand_name_Huawei       5.1141
brand_name_Infinix      1.2136
brand_name_Karbonn      1.6658
brand_name_LG           4.8667
brand_name_Lava         1.6475
brand_name_Lenovo       3.8153
brand_name_Meizu        2.1945
brand_name_Micromax     3.0309
brand_name_Microsoft    1.9012
brand_name_Motorola     3.1325
brand_name_Nokia        3.5141
brand_name_OnePlus      1.5888
brand_name_Oppo         3.0652
brand_name_Others       8.5834
brand_name_Panasonic    2.0561
brand_name_Realme       1.8090
brand_name_Samsung      6.6920
brand_name_Sony         2.7647
brand_name_Spice        1.6444
brand_name_Vivo         3.5522
brand_name_XOLO         2.0246
brand_name_Xiaomi       3.6126
brand_name_ZTE          3.7511
os_Others               1.5076
os_Windows              1.6580
os_iOS                     inf
dtype: float64

NOTES:

  • The variable "years_old" have VIF value 5.31, indicating presence of strong multicollinearity
  • This is the only variable to be elimitaned as we will ignore the VIF values for dummy variables and the constant (intercept)
  • After droping the variable "years_old" we have all VIF values under the threshold of 5, then is verified No Multicollinearity in the model "olsmodel1"
  • Now that we do not have multicollinearity in our data, the p-values of the coefficients have become reliable and we can remove the non-significant predictor variables.

Dropping high p-value variables¶

In [114]:
# Extracting the relevant statistics
variable_names = x_train1.columns
coefficients = olsmodel1.params
p_values = olsmodel1.pvalues
standard_errors = olsmodel1.bse
t_values = olsmodel1.tvalues

# Creating a dictionary with the statistics
results_dict = {
    'Variable': variable_names,
    'Coefficient': coefficients,
    'Standard Error': standard_errors,
    'T-value': t_values,
    'P-value': p_values
}

# Creating a DataFrame from the dictionary
results1_df = pd.DataFrame(results_dict)

# Print the resulting DataFrame
results1_df
Out[114]:
Variable Coefficient Standard Error T-value P-value
const const 0.0878 0.0042 20.8418 0.0000
screen_size screen_size 0.0303 0.0039 7.8651 0.0000
selfie_camera_mp selfie_camera_mp 0.0161 0.0016 9.9838 0.0000
int_memory int_memory 0.0017 0.0002 8.4453 0.0000
ram ram 0.3513 0.0169 20.8418 0.0000
battery battery -0.0000 0.0000 -0.5092 0.6106
weight weight 0.0019 0.0003 7.1666 0.0000
days_used days_used 0.0000 0.0000 1.8191 0.0690
normalized_new_price normalized_new_price 0.3319 0.0120 27.5813 0.0000
main_camera_mp_imp main_camera_mp_imp 0.0253 0.0016 15.6777 0.0000
tech_4G5G tech_4G5G 0.0348 0.0221 1.5754 0.1153
tech_4G tech_4G 0.0571 0.0112 5.1042 0.0000
tech_2G3G tech_2G3G -0.0041 0.0142 -0.2857 0.7751
brand_name_Alcatel brand_name_Alcatel -0.0733 0.0443 -1.6565 0.0978
brand_name_Apple brand_name_Apple 0.0157 0.0337 0.4646 0.6423
brand_name_Asus brand_name_Asus -0.0306 0.0452 -0.6765 0.4988
brand_name_BlackBerry brand_name_BlackBerry -0.1543 0.0685 -2.2526 0.0244
brand_name_Celkon brand_name_Celkon -0.0025 0.0720 -0.0350 0.9721
brand_name_Coolpad brand_name_Coolpad -0.0687 0.0650 -1.0562 0.2910
brand_name_Gionee brand_name_Gionee -0.0511 0.0523 -0.9779 0.3282
brand_name_Google brand_name_Google 0.0073 0.0967 0.0754 0.9399
brand_name_HTC brand_name_HTC -0.0393 0.0454 -0.8652 0.3870
brand_name_Honor brand_name_Honor -0.0539 0.0471 -1.1448 0.2524
brand_name_Huawei brand_name_Huawei -0.0673 0.0419 -1.6040 0.1089
brand_name_Infinix brand_name_Infinix -0.0186 0.0975 -0.1906 0.8489
brand_name_Karbonn brand_name_Karbonn -0.0430 0.0599 -0.7182 0.4727
brand_name_LG brand_name_LG -0.0640 0.0420 -1.5230 0.1279
brand_name_Lava brand_name_Lava 0.0265 0.0595 0.4457 0.6559
brand_name_Lenovo brand_name_Lenovo -0.0479 0.0433 -1.1074 0.2682
brand_name_Meizu brand_name_Meizu -0.0744 0.0518 -1.4352 0.1514
brand_name_Micromax brand_name_Micromax -0.0476 0.0455 -1.0466 0.2954
brand_name_Microsoft brand_name_Microsoft 0.0213 0.0773 0.2760 0.7826
brand_name_Motorola brand_name_Motorola -0.1032 0.0462 -2.2334 0.0256
brand_name_Nokia brand_name_Nokia -0.0147 0.0468 -0.3146 0.7531
brand_name_OnePlus brand_name_OnePlus -0.0854 0.0664 -1.2859 0.1986
brand_name_Oppo brand_name_Oppo -0.0088 0.0464 -0.1891 0.8500
brand_name_Others brand_name_Others -0.0621 0.0394 -1.5747 0.1155
brand_name_Panasonic brand_name_Panasonic -0.0761 0.0522 -1.4584 0.1449
brand_name_Realme brand_name_Realme -0.0612 0.0598 -1.0242 0.3058
brand_name_Samsung brand_name_Samsung -0.0529 0.0405 -1.3059 0.1917
brand_name_Sony brand_name_Sony -0.1157 0.0483 -2.3946 0.0167
brand_name_Spice brand_name_Spice -0.0987 0.0609 -1.6222 0.1049
brand_name_Vivo brand_name_Vivo -0.0977 0.0454 -2.1519 0.0315
brand_name_XOLO brand_name_XOLO -0.1041 0.0525 -1.9826 0.0475
brand_name_Xiaomi brand_name_Xiaomi -0.0106 0.0453 -0.2351 0.8142
brand_name_ZTE brand_name_ZTE -0.0640 0.0440 -1.4561 0.1455
os_Others os_Others 0.0197 0.0321 0.6141 0.5392
os_Windows os_Windows -0.0228 0.0407 -0.5611 0.5748
os_iOS os_iOS 0.0157 0.0337 0.4646 0.6423
In [115]:
results1_df['Variable'][results1_df['P-value']>=0.05].tolist()
Out[115]:
['battery',
 'days_used',
 'tech_4G5G',
 'tech_2G3G',
 'brand_name_Alcatel',
 'brand_name_Apple',
 'brand_name_Asus',
 'brand_name_Celkon',
 'brand_name_Coolpad',
 'brand_name_Gionee',
 'brand_name_Google',
 'brand_name_HTC',
 'brand_name_Honor',
 'brand_name_Huawei',
 'brand_name_Infinix',
 'brand_name_Karbonn',
 'brand_name_LG',
 'brand_name_Lava',
 'brand_name_Lenovo',
 'brand_name_Meizu',
 'brand_name_Micromax',
 'brand_name_Microsoft',
 'brand_name_Nokia',
 'brand_name_OnePlus',
 'brand_name_Oppo',
 'brand_name_Others',
 'brand_name_Panasonic',
 'brand_name_Realme',
 'brand_name_Samsung',
 'brand_name_Spice',
 'brand_name_Xiaomi',
 'brand_name_ZTE',
 'os_Others',
 'os_Windows',
 'os_iOS']
In [116]:
results1_df['Variable'][results1_df['P-value']<0.05].tolist()
Out[116]:
['const',
 'screen_size',
 'selfie_camera_mp',
 'int_memory',
 'ram',
 'weight',
 'normalized_new_price',
 'main_camera_mp_imp',
 'tech_4G',
 'brand_name_BlackBerry',
 'brand_name_Motorola',
 'brand_name_Sony',
 'brand_name_Vivo',
 'brand_name_XOLO']
In [117]:
x_train2=x_train1.loc[:,results1_df['Variable'][results1_df['P-value']<0.05].tolist()]
x_train2
Out[117]:
const screen_size selfie_camera_mp int_memory ram weight normalized_new_price main_camera_mp_imp tech_4G brand_name_BlackBerry brand_name_Motorola brand_name_Sony brand_name_Vivo brand_name_XOLO
2260 1.0000 12.7000 2.0000 16.0000 4.0000 130.0000 5.2987 8.0000 0 0 0 0 0 0
427 1.0000 19.2500 2.0000 32.0000 4.0000 244.5000 5.8621 5.0000 0 0 0 0 0 0
1076 1.0000 8.8150 0.3000 16.0000 4.0000 142.9000 4.0972 3.1500 0 0 0 0 0 0
1326 1.0000 12.7000 5.0000 16.0000 4.0000 145.0000 5.3118 13.0000 1 0 0 0 0 0
1722 1.0000 12.8800 1.3000 32.0000 4.0000 167.0000 5.5991 8.0000 1 0 0 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
3046 1.0000 10.2400 0.3000 16.0000 4.0000 150.0000 4.9396 5.0000 0 0 0 0 0 1
1056 1.0000 12.7500 8.0000 16.0000 4.0000 125.3000 5.9857 13.0000 1 0 0 0 0 0
1254 1.0000 10.1600 1.6000 16.0000 4.0000 107.0000 5.1962 5.0000 0 0 0 0 0 0
293 1.0000 15.3200 16.0000 64.0000 4.0000 164.0000 5.0659 8.0000 1 0 0 0 0 0
1219 1.0000 12.7000 5.0000 32.0000 4.0000 160.0000 5.7052 4.0000 1 0 0 0 0 0

2177 rows × 14 columns

In [118]:
olsmodel2 = sm.OLS(y_train, x_train2).fit()
print(olsmodel2.summary())
                              OLS Regression Results                             
=================================================================================
Dep. Variable:     normalized_used_price   R-squared:                       0.819
Model:                               OLS   Adj. R-squared:                  0.818
Method:                    Least Squares   F-statistic:                     816.4
Date:                   Sat, 08 Jun 2024   Prob (F-statistic):               0.00
Time:                           04:51:55   Log-Likelihood:                 251.99
No. Observations:                   2177   AIC:                            -478.0
Df Residuals:                       2164   BIC:                            -404.1
Df Model:                             12                                         
Covariance Type:               nonrobust                                         
=========================================================================================
                            coef    std err          t      P>|t|      [0.025      0.975]
-----------------------------------------------------------------------------------------
const                     0.0871      0.003     30.565      0.000       0.081       0.093
screen_size               0.0286      0.003      8.253      0.000       0.022       0.035
selfie_camera_mp          0.0156      0.001     10.513      0.000       0.013       0.019
int_memory                0.0017      0.000      9.085      0.000       0.001       0.002
ram                       0.3482      0.011     30.565      0.000       0.326       0.371
weight                    0.0018      0.000      7.673      0.000       0.001       0.002
normalized_new_price      0.3365      0.010     32.244      0.000       0.316       0.357
main_camera_mp_imp        0.0250      0.001     16.848      0.000       0.022       0.028
tech_4G                   0.0526      0.012      4.402      0.000       0.029       0.076
brand_name_BlackBerry    -0.0983      0.055     -1.796      0.073      -0.206       0.009
brand_name_Motorola      -0.0593      0.027     -2.231      0.026      -0.111      -0.007
brand_name_Sony          -0.0655      0.030     -2.175      0.030      -0.125      -0.006
brand_name_Vivo          -0.0503      0.025     -2.018      0.044      -0.099      -0.001
brand_name_XOLO          -0.0595      0.037     -1.588      0.112      -0.133       0.014
==============================================================================
Omnibus:                      100.266   Durbin-Watson:                   1.979
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              123.321
Skew:                          -0.482   Prob(JB):                     1.66e-27
Kurtosis:                       3.657   Cond. No.                     6.06e+16
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 1.77e-26. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
In [119]:
# Make predictions
x_test2=x_test1.loc[:,results1_df['Variable'][results1_df['P-value']<0.05].tolist()]
y_train_pred = olsmodel2.predict(x_train2)
y_test_pred = olsmodel2.predict(x_test2)

# Calculate metrics for training data
rmse_train = np.sqrt(mean_squared_error(y_train, y_train_pred))
mae_train = mean_absolute_error(y_train, y_train_pred)
r2_train = r2_score(y_train, y_train_pred)
adj_r2_train = 1 - (1-r2_train)*(len(y_train)-1)/(len(y_train)-x_train.shape[1]-1)
mape_train = mean_absolute_percentage_error(y_train, y_train_pred)

# Calculate metrics for testing data
rmse_test = np.sqrt(mean_squared_error(y_test, y_test_pred))
mae_test = mean_absolute_error(y_test, y_test_pred)
r2_test = r2_score(y_test, y_test_pred)
adj_r2_test = 1 - (1-r2_test)*(len(y_test)-1)/(len(y_test)-x_test.shape[1]-1)
mape_test = mean_absolute_percentage_error(y_test, y_test_pred)

# Append the results to the DataFrame
compare_df = compare_df.append({
        'Model': "olsmodel2",
        'RMSE_Train': rmse_train,
        'MAE_Train': mae_train,
        'R2_Train': r2_train,
        'Adj_R2_Train': adj_r2_train,
        'MAPE_Train': mape_train,
        'RMSE_Test': rmse_test,
        'MAE_Test': mae_test,
        'R2_Test': r2_test,
        'Adj_R2_Test': adj_r2_test,
        'MAPE_Test': mape_test
    }, ignore_index=True)

compare_df
Out[119]:
Model RMSE_Train MAE_Train R2_Train Adj_R2_Train MAPE_Train RMSE_Test MAE_Test R2_Test Adj_R2_Test MAPE_Test
0 olsmodel 0.2141 0.1704 0.8214 0.8172 4.0070 0.2140 0.1695 0.8061 0.7951 4.0127
1 olsmodel1 0.2142 0.1704 0.8214 0.8172 4.0071 0.2139 0.1694 0.8062 0.7952 4.0094
2 olsmodel2 0.2155 0.1715 0.8191 0.8148 4.0363 0.2127 0.1684 0.8084 0.7976 3.9857
In [120]:
# Extracting the relevant statistics
variable_names = x_train2.columns
coefficients = olsmodel2.params
p_values = olsmodel2.pvalues
standard_errors = olsmodel2.bse
t_values = olsmodel2.tvalues

# Creating a dictionary with the statistics
results_dict = {
    'Variable': variable_names,
    'Coefficient': coefficients,
    'Standard Error': standard_errors,
    'T-value': t_values,
    'P-value': p_values
}

# Creating a DataFrame from the dictionary
results2_df = pd.DataFrame(results_dict)

# Print the resulting DataFrame
results2_df
Out[120]:
Variable Coefficient Standard Error T-value P-value
const const 0.0871 0.0028 30.5652 0.0000
screen_size screen_size 0.0286 0.0035 8.2531 0.0000
selfie_camera_mp selfie_camera_mp 0.0156 0.0015 10.5129 0.0000
int_memory int_memory 0.0017 0.0002 9.0847 0.0000
ram ram 0.3482 0.0114 30.5652 0.0000
weight weight 0.0018 0.0002 7.6726 0.0000
normalized_new_price normalized_new_price 0.3365 0.0104 32.2443 0.0000
main_camera_mp_imp main_camera_mp_imp 0.0250 0.0015 16.8484 0.0000
tech_4G tech_4G 0.0526 0.0120 4.4020 0.0000
brand_name_BlackBerry brand_name_BlackBerry -0.0983 0.0547 -1.7964 0.0726
brand_name_Motorola brand_name_Motorola -0.0593 0.0266 -2.2308 0.0258
brand_name_Sony brand_name_Sony -0.0655 0.0301 -2.1749 0.0297
brand_name_Vivo brand_name_Vivo -0.0503 0.0249 -2.0180 0.0437
brand_name_XOLO brand_name_XOLO -0.0595 0.0375 -1.5880 0.1124
In [121]:
results2_df['Variable'][results2_df['P-value']>=0.05].tolist()
Out[121]:
['brand_name_BlackBerry', 'brand_name_XOLO']
In [122]:
results2_df['Variable'][results2_df['P-value']<0.05].tolist()
Out[122]:
['const',
 'screen_size',
 'selfie_camera_mp',
 'int_memory',
 'ram',
 'weight',
 'normalized_new_price',
 'main_camera_mp_imp',
 'tech_4G',
 'brand_name_Motorola',
 'brand_name_Sony',
 'brand_name_Vivo']
In [123]:
x_train3=x_train2.loc[:,results2_df['Variable'][results2_df['P-value']<0.05].tolist()]
x_train3
Out[123]:
const screen_size selfie_camera_mp int_memory ram weight normalized_new_price main_camera_mp_imp tech_4G brand_name_Motorola brand_name_Sony brand_name_Vivo
2260 1.0000 12.7000 2.0000 16.0000 4.0000 130.0000 5.2987 8.0000 0 0 0 0
427 1.0000 19.2500 2.0000 32.0000 4.0000 244.5000 5.8621 5.0000 0 0 0 0
1076 1.0000 8.8150 0.3000 16.0000 4.0000 142.9000 4.0972 3.1500 0 0 0 0
1326 1.0000 12.7000 5.0000 16.0000 4.0000 145.0000 5.3118 13.0000 1 0 0 0
1722 1.0000 12.8800 1.3000 32.0000 4.0000 167.0000 5.5991 8.0000 1 0 0 0
... ... ... ... ... ... ... ... ... ... ... ... ...
3046 1.0000 10.2400 0.3000 16.0000 4.0000 150.0000 4.9396 5.0000 0 0 0 0
1056 1.0000 12.7500 8.0000 16.0000 4.0000 125.3000 5.9857 13.0000 1 0 0 0
1254 1.0000 10.1600 1.6000 16.0000 4.0000 107.0000 5.1962 5.0000 0 0 0 0
293 1.0000 15.3200 16.0000 64.0000 4.0000 164.0000 5.0659 8.0000 1 0 0 0
1219 1.0000 12.7000 5.0000 32.0000 4.0000 160.0000 5.7052 4.0000 1 0 0 0

2177 rows × 12 columns

In [124]:
olsmodel3 = sm.OLS(y_train, x_train3).fit()
print(olsmodel3.summary())
                              OLS Regression Results                             
=================================================================================
Dep. Variable:     normalized_used_price   R-squared:                       0.819
Model:                               OLS   Adj. R-squared:                  0.818
Method:                    Least Squares   F-statistic:                     977.4
Date:                   Sat, 08 Jun 2024   Prob (F-statistic):               0.00
Time:                           04:51:56   Log-Likelihood:                 249.11
No. Observations:                   2177   AIC:                            -476.2
Df Residuals:                       2166   BIC:                            -413.7
Df Model:                             10                                         
Covariance Type:               nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                    0.0866      0.003     30.452      0.000       0.081       0.092
screen_size              0.0291      0.003      8.454      0.000       0.022       0.036
selfie_camera_mp         0.0157      0.001     10.557      0.000       0.013       0.019
int_memory               0.0017      0.000      9.079      0.000       0.001       0.002
ram                      0.3465      0.011     30.452      0.000       0.324       0.369
weight                   0.0018      0.000      7.630      0.000       0.001       0.002
normalized_new_price     0.3369      0.010     32.257      0.000       0.316       0.357
main_camera_mp_imp       0.0249      0.001     16.738      0.000       0.022       0.028
tech_4G                  0.0540      0.012      4.572      0.000       0.031       0.077
brand_name_Motorola     -0.0576      0.027     -2.166      0.030      -0.110      -0.005
brand_name_Sony         -0.0628      0.030     -2.085      0.037      -0.122      -0.004
brand_name_Vivo         -0.0495      0.025     -1.987      0.047      -0.098      -0.001
==============================================================================
Omnibus:                       98.638   Durbin-Watson:                   1.978
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              121.513
Skew:                          -0.475   Prob(JB):                     4.11e-27
Kurtosis:                       3.660   Cond. No.                     6.02e+16
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 1.79e-26. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
In [125]:
# Make predictions
x_test3=x_test2.loc[:,results2_df['Variable'][results2_df['P-value']<0.05].tolist()]
y_train_pred = olsmodel3.predict(x_train3)
y_test_pred = olsmodel3.predict(x_test3)

# Calculate metrics for training data
rmse_train = np.sqrt(mean_squared_error(y_train, y_train_pred))
mae_train = mean_absolute_error(y_train, y_train_pred)
r2_train = r2_score(y_train, y_train_pred)
adj_r2_train = 1 - (1-r2_train)*(len(y_train)-1)/(len(y_train)-x_train.shape[1]-1)
mape_train = mean_absolute_percentage_error(y_train, y_train_pred)

# Calculate metrics for testing data
rmse_test = np.sqrt(mean_squared_error(y_test, y_test_pred))
mae_test = mean_absolute_error(y_test, y_test_pred)
r2_test = r2_score(y_test, y_test_pred)
adj_r2_test = 1 - (1-r2_test)*(len(y_test)-1)/(len(y_test)-x_test.shape[1]-1)
mape_test = mean_absolute_percentage_error(y_test, y_test_pred)

# Append the results to the DataFrame
compare_df = compare_df.append({
        'Model': "olsmodel3",
        'RMSE_Train': rmse_train,
        'MAE_Train': mae_train,
        'R2_Train': r2_train,
        'Adj_R2_Train': adj_r2_train,
        'MAPE_Train': mape_train,
        'RMSE_Test': rmse_test,
        'MAE_Test': mae_test,
        'R2_Test': r2_test,
        'Adj_R2_Test': adj_r2_test,
        'MAPE_Test': mape_test
    }, ignore_index=True)

compare_df
Out[125]:
Model RMSE_Train MAE_Train R2_Train Adj_R2_Train MAPE_Train RMSE_Test MAE_Test R2_Test Adj_R2_Test MAPE_Test
0 olsmodel 0.2141 0.1704 0.8214 0.8172 4.0070 0.2140 0.1695 0.8061 0.7951 4.0127
1 olsmodel1 0.2142 0.1704 0.8214 0.8172 4.0071 0.2139 0.1694 0.8062 0.7952 4.0094
2 olsmodel2 0.2155 0.1715 0.8191 0.8148 4.0363 0.2127 0.1684 0.8084 0.7976 3.9857
3 olsmodel3 0.2158 0.1718 0.8186 0.8143 4.0424 0.2122 0.1681 0.8092 0.7984 3.9762

NOTES:

  • In olsmodel2, two variables 'brand_name_BlackBerry' and 'brand_name_XOLO' had an increase on p-value. A second iteration required.
  • Now no feature has p-value greater than 0.05, so we'll consider the features in x_train3 as the final set of predictor variables and olsmodel3 as the final model to move forward with.
  • Now adjusted R-squared is 0.8186, i.e., our model is able to explain ~82% of the variance
  • The adjusted R-squared in olsmod1 (where we considered the variables without multicollinearity) was 0.8214
  • This shows that dropping the variables had a minor impact the model
  • RMSE and MAE values are comparable for train and test sets, indicating that the model is not overfitting

Test for Linearity and Independence¶

In [126]:
#first step is create a dataframe for checks, with actual, fitted and residual values
check3_df = pd.DataFrame()
check3_df["Actual Values"] = y_train.values.flatten() # actual values
check3_df["Fitted Values"] = olsmodel3.fittedvalues.values # predicted values
check3_df["Residuals"] = olsmodel3.resid.values # residuals
check3_df.head()
Out[126]:
Actual Values Fitted Values Residuals
0 4.3372 4.1190 0.2182
1 4.9988 4.6586 0.3402
2 3.3351 3.4772 -0.1421
3 4.1752 4.3758 -0.2006
4 4.1854 4.3624 -0.1770
In [127]:
# let us plot the fitted values vs residuals
sns.set_style("whitegrid")
sns.residplot(
    data=check3_df, x="Fitted Values", y="Residuals", color="purple", lowess=True
)
plt.xlabel("Fitted Values")
plt.ylabel("Residuals")
plt.title("Fitted vs Residual plot")
plt.show()
No description has been provided for this image

NOTES:

  • The plot shows the distribution of residuals (errors) vs fitted values (predicted values).
  • If there exist any pattern in this plot, we consider it as signs of non-linearity in the data and a pattern means that the model doesn't capture non-linear effects.
  • We see no pattern in the plot above. Hence, the assumptions of linearity and independence are satisfied.

Test for Normality¶

In [128]:
#Plot the distribution of residuals
sns.histplot(check3_df["Residuals"], kde=True)
plt.title("Normality of residuals")
plt.show()
No description has been provided for this image
In [129]:
# QQplot
stats.probplot(check3_df["Residuals"], dist="norm", plot=pylab)
plt.show()
No description has been provided for this image
In [130]:
stats.shapiro(check3_df["Residuals"])
Out[130]:
ShapiroResult(statistic=0.9832592606544495, pvalue=2.598801055470469e-15)

NOTES:

  • Since p-value < 0.05, the residuals are not normal as per the Shapiro-Wilk test.
  • Strictly speaking, the residuals are not normal.
  • However, as an approximation, we can accept this distribution as close to being normal.
  • So, the assumption of normality is satisfied.

Test for Homoscedasticity¶

In [131]:
name = ["F statistic", "p-value"]
test = sms.het_goldfeldquandt(check3_df["Residuals"], x_train3)
lzip(name, test)
Out[131]:
[('F statistic', 1.0289166941528927), ('p-value', 0.3199777521407358)]

NOTES:

  • Test for homoscedasticity by using the goldfeldquandt test
  • If we get a p-value greater than 0.05, we can say that the residuals are homoscedastic. Otherwise, they are heteroscedastic.
  • As per the goldfeldquandt test results, we can conclude the assumption of Heteroscedasticity is satisfied.

Consolidated Notes from Linear Regression assumptions¶

Multicollinearity

  • The variable "years_old" have VIF value 5.31, indicating presence of strong multicollinearity
  • This is the only variable to be elimitaned as we will ignore the VIF values for dummy variables and the constant (intercept)
  • After droping the variable "years_old" we have all VIF values under the threshold of 5, then is verified No Multicollinearity in the model "olsmodel1"
  • Now that we do not have multicollinearity in our data, the p-values of the coefficients have become reliable and we can remove the non-significant predictor variables.
  • In olsmodel2, two variables 'brand_name_BlackBerry' and 'brand_name_XOLO' had an increase on p-value. A second iteration required.
  • Now no feature has p-value greater than 0.05, so we'll consider the features in x_train3 as the final set of predictor variables and olsmodel3 as the final model to move forward with.
  • Now adjusted R-squared is 0.8186, i.e., our model is able to explain ~82% of the variance
  • The adjusted R-squared in olsmod1 (where we considered the variables without multicollinearity) was 0.8214
  • This shows that dropping the variables had a minor impact the model
  • RMSE and MAE values are comparable for train and test sets, indicating that the model is not overfitting

Linearity of variables and Independence of error terms

  • The plot shows the distribution of residuals (errors) vs fitted values (predicted values).
  • If there exist any pattern in this plot, we consider it as signs of non-linearity in the data and a pattern means that the model doesn't capture non-linear effects.
  • We see no pattern in the plot above. Hence, the assumptions of linearity and independence are satisfied.

Normality of error terms

  • Since p-value < 0.05, the residuals are not normal as per the Shapiro-Wilk test.
  • Strictly speaking, the residuals are not normal.
  • However, as an approximation, we can accept this distribution as close to being normal.
  • So, the assumption of normality is satisfied.

Heteroscedasticity

  • Test for homoscedasticity by using the goldfeldquandt test
  • If we get a p-value greater than 0.05, we can say that the residuals are homoscedastic. Otherwise, they are heteroscedastic.
  • As per the goldfeldquandt test results, we can conclude the assumption of Heteroscedasticity is satisfied.

Final Model¶

In [132]:
print(olsmodel3.summary())
                              OLS Regression Results                             
=================================================================================
Dep. Variable:     normalized_used_price   R-squared:                       0.819
Model:                               OLS   Adj. R-squared:                  0.818
Method:                    Least Squares   F-statistic:                     977.4
Date:                   Sat, 08 Jun 2024   Prob (F-statistic):               0.00
Time:                           04:51:58   Log-Likelihood:                 249.11
No. Observations:                   2177   AIC:                            -476.2
Df Residuals:                       2166   BIC:                            -413.7
Df Model:                             10                                         
Covariance Type:               nonrobust                                         
========================================================================================
                           coef    std err          t      P>|t|      [0.025      0.975]
----------------------------------------------------------------------------------------
const                    0.0866      0.003     30.452      0.000       0.081       0.092
screen_size              0.0291      0.003      8.454      0.000       0.022       0.036
selfie_camera_mp         0.0157      0.001     10.557      0.000       0.013       0.019
int_memory               0.0017      0.000      9.079      0.000       0.001       0.002
ram                      0.3465      0.011     30.452      0.000       0.324       0.369
weight                   0.0018      0.000      7.630      0.000       0.001       0.002
normalized_new_price     0.3369      0.010     32.257      0.000       0.316       0.357
main_camera_mp_imp       0.0249      0.001     16.738      0.000       0.022       0.028
tech_4G                  0.0540      0.012      4.572      0.000       0.031       0.077
brand_name_Motorola     -0.0576      0.027     -2.166      0.030      -0.110      -0.005
brand_name_Sony         -0.0628      0.030     -2.085      0.037      -0.122      -0.004
brand_name_Vivo         -0.0495      0.025     -1.987      0.047      -0.098      -0.001
==============================================================================
Omnibus:                       98.638   Durbin-Watson:                   1.978
Prob(Omnibus):                  0.000   Jarque-Bera (JB):              121.513
Skew:                          -0.475   Prob(JB):                     4.11e-27
Kurtosis:                       3.660   Cond. No.                     6.02e+16
==============================================================================

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The smallest eigenvalue is 1.79e-26. This might indicate that there are
strong multicollinearity problems or that the design matrix is singular.
In [133]:
compare_df
Out[133]:
Model RMSE_Train MAE_Train R2_Train Adj_R2_Train MAPE_Train RMSE_Test MAE_Test R2_Test Adj_R2_Test MAPE_Test
0 olsmodel 0.2141 0.1704 0.8214 0.8172 4.0070 0.2140 0.1695 0.8061 0.7951 4.0127
1 olsmodel1 0.2142 0.1704 0.8214 0.8172 4.0071 0.2139 0.1694 0.8062 0.7952 4.0094
2 olsmodel2 0.2155 0.1715 0.8191 0.8148 4.0363 0.2127 0.1684 0.8084 0.7976 3.9857
3 olsmodel3 0.2158 0.1718 0.8186 0.8143 4.0424 0.2122 0.1681 0.8092 0.7984 3.9762
In [134]:
# Extracting the relevant statistics
variable_names = x_train3.columns
coefficients = olsmodel3.params
p_values = olsmodel3.pvalues
standard_errors = olsmodel3.bse
t_values = olsmodel3.tvalues

# Creating a dictionary with the statistics
results_dict = {
    'Variable': variable_names,
    'Coefficient': coefficients,
    'Standard Error': standard_errors,
    'T-value': t_values,
    'P-value': p_values
}

# Creating a DataFrame from the dictionary
results3_df = pd.DataFrame(results_dict)

# Print the resulting DataFrame
results3_df.sort_values('Coefficient', ascending=False)['Coefficient'].head(4)
Out[134]:
ram                    0.3465
normalized_new_price   0.3369
const                  0.0866
tech_4G                0.0540
Name: Coefficient, dtype: float64

Actionable Insights and Recommendations¶

Insights

  • The model is able to explain ~82% of the variation in the data and within 4% of the used device price.
  • This indicates that the model is good for prediction as well as inference purposes
  • If the Amount of RAM in GB of the device increases by one unit, then the Normalized price of the used/refurbished device in euros increases by 0.3465 units, all other variables held constant
  • If the Normalized price of a new device of the same model in euros of the device increases by one unit, then the Normalized price of the used/refurbished device in euros increases by 0.3369 units, all other variables held constant
  • If 4G is available on the device, then the Normalized price of the used/refurbished device in euros increases by 0.054 units, all other variables held constant

Recommendations

  • ReCell might focus on 4G devices.
    • focus on 5G devices should be delayed, as this variable as minimal impact on price as per data.
    • Considering 5G being the latest technology, most of units available are new and used market is still low.
    • 2G, 3G devices should be avoided. Today's utilization of the devices requires at least 4G technology for satisfactory user experience.
  • ReCell might focus on high value devices, as this units have better resell value
  • ReCell should focus on RAM memory as device characteristics.
    • As each brand might add a number of particular features and functionalities to their products
    • ReCell can use RAM Memory as a common (all unites have it) and relatable metric for evaluation by ReCell users.
In [135]:
#create html version
!jupyter nbconvert --to html SLF_Project_LearnerNotebook_FullCode.ipynb
[NbConvertApp] Converting notebook SLF_Project_LearnerNotebook_FullCode.ipynb to html
[NbConvertApp] WARNING | Alternative text is missing on 9 image(s).
[NbConvertApp] Writing 3827822 bytes to SLF_Project_LearnerNotebook_FullCode.html
In [ ]: